Pagination

Pagination is an important aspect of handling large datasets in API responses. Lumos SDK provides a standardised approach to implement pagination across different APIs, ensuring efficient data retrieval and consistent behaviour.

Key principles

Handle Large Datasets: Efficiently retrieve and process large amounts of data from APIs.
Optimize Performance: Reduce load on both the client and server by fetching data in smaller chunks.
Maintain Consistency: Provide a uniform interface for pagination across different API implementations.
Pagination Behaviour: Ensure that the page token returned is None when Lumos should stop calling the endpoint.

Implementing the pagination class

Pagination in Lumos SDK typically involves two main components:

A Pagination class that represents the current page state.
A NextPageToken class that handles serialization and deserialization of pagination information or the above page state.

# Example: abc/pagination.py
from connector.utils.pagination import (
    NextPageTokenInterface,
    PaginationBase,
    create_next_page_token,
)

DEFAULT_PAGE_SIZE = 100

class AbcPagination(PaginationBase):
    resource_type: str | None = None
    page: int

    @classmethod
    def default(cls, resource_type: str) -> "AbcPagination":
        return cls(
            resource_type=resource_type,
            page=1,
        )

    def next_page(self) -> AbcPagination:
        return AbcPagination(
            resource_type=self.resource_type,
            page=self.page + 1,
        )

AbcNextPageToken = create_next_page_token(AbcPagination, "AbcNextPageToken")

def make_page(next: list[AbcPagination], size: int) -> Page | None:
    next_page_token = AbcNextPageToken.from_paginations(next)
    return Page(token=next_page_token.token, size=size) if next_page_token.token else None

Recommendations and requirements to consider:

PaginationClass: Represents the current page state (e.g., AbcPagination).
- Includes any relevant fields like resource_type and page which you might need to paginate by.
  - 📘
    The resulting string token is using compression and can be populated with various parameters you might need to filter or paginate by on a per-capability basis. Note that all HTTP request parameters have size constraints.
- Implements default method to create an initial pagination state.
- Implements next_page method to generate the next page state.
NextPageTokenClass: Handles serialization and deserialization of pagination information.
- Created using the create_next_page_token utility function.
- Provides methods to convert between token strings and Pagination objects.
Helper functions, while not mandatory, are often added to:
- Simplify the creation of Page objects (e.g., make_page function).
- Handle specific pagination logic for the API
- Provide utility methods for pagination-related operations

Using Pagination in Capabilities

The following is a code example of how to paginate:

# Example: abc/capabilities_read.py
from connector.oai.capability import get_page

def list_accounts(args: ListAccountsRequest) -> ListAccountsResponse:
	page = get_page(args)
	page_size = page.size or DEFAULT_PAGE_SIZE
	
	try:
	    current_pagination = NextPageToken(page.token).paginations()[0]
	except IndexError:
	    current_pagination = Pagination.default(resource_type)
	    
  async with AbcClient(args) as client:
    params = {
        "page": current_pagination.page,
        "perPage": page_size,
    }
    response = await client.getUsers(params=params)
    data = response.json()
    accounts = data.to_accounts()
    
    if more_data_available:
		    next_pagination = [current_pagination.next_page()]
		else:
		    next_pagination = []
		
		next_page = make_page(next_pagination, page_size)
		
		return ListAccountsResponse(
				response=accounts,
				page=next_page,
		)

Key considerations

The get_page(args) helper function is used to retrieve pagination information from the request arguments.
page_size is determined using the page size from the request or a default value.
The same page_size is used across all pages to maintain consistency.
Instead of using template strings for endpoints, the endpoint is typically stored in the pagination object or determined based on the resource type or custom argument.

Handling Multiple Endpoint Pagination

For capabilities that internally call a larger set of API endpoints you can make use of the list nature of the Pagination token:

# Example: abc/capabilities_read.py:list_accounts
# ...
page = get_page(args)
page_size = page.size or DEFAULT_PAGE_SIZE

try:
    current_paginations = NextPageToken(page.token).paginations()
except IndexError:
    current_paginations = [Pagination.default(endpoint="/first-endpoint")]

async with Abcclient as client:
    for pagination in current_paginations:
        params = {
            "page": pagination.page,
            "perPage": page_size,
        }
        response = await client.get(pagination.endpoint, params=params)
        # ...

Storing Additional Information in the Pagination Tokens

In some cases, it's necessary to store additional data across pages to ensure data consistency or uniqueness.

class AbcPagination(PaginationBase):
    page: int
    unique_ids: list[str]

    @classmethod
    def default(cls, endpoint: str) -> "AbcPagination":
        return cls(endpoint=endpoint, page=1, unique_ids=list())

In this example, we prepare a list for unique IDs, for example when iterating over an API which returns duplicate entries and we might want to limit the amount of extra calls we make for each entry detail.

Best Practises

Store Minimal Data: Only store essential information in the pagination token. In the example, only unique identifiers are stored, not entire objects.
Use Efficient Data Structures: Choose data structures that balance functionality and size. For example, using a list of strings for unique IDs instead of more complex objects.
Limit Data Volume: If the amount of data grows too large, consider implementing a cutoff or using a more efficient storage method.