pagination-in-rest-apis

“`html

When working with an extensive dataset, RESTful APIs typically paginate results to prevent overwhelming the system with excessive data in a single request. Power Query in Power BI serves as an excellent instrument to query and handle such paginated API responses. Pagination ensures that data is delivered in manageable portions using parameters such as page, offset, or limit. In this article, let us delve into how Power Query in Power BI manages paginated API responses.

Table of Contents:

What is a Paginated Response?

A paginated response refers to a method used by APIs to divide large datasets into smaller subsets. The API doesn’t send all data in one response; it provides a restricted set of results along with metadata for the next set, allowing you to request additional data as necessary.

The pagination metadata comprises:

  • Current page number
  • Total page count
  • Next page URL

Benefits of Pagination

  • Enhanced Performance: Pagination retains large datasets within a table, preventing the sluggish performance that emerges when handling significant data in a single request.
  • Controlled Server Load: By segmenting large data into smaller pieces, the server can more easily manage requests, reducing the chances of being overwhelmed.
  • Better User Experience: Pagination allows for quicker loading times and more seamless user interactions, thereby enhancing user experience.
  • Efficient Memory Use: It minimizes memory consumption by loading smaller data batches at once.
  • Lower Network Traffic: Paginated responses decrease the volume of data transmitted in a single request, which reduces network load and optimizes response times.

Accessing RESTful APIs via Power Query

Power Query is a robust tool that enables you to extract, transform, and clean data from assorted sources. One of its most prevalent applications is querying RESTful APIs. When working with a significant dataset, these APIs provide data in a paginated format. Power BI effectively handles such paginated responses, allowing for the processing of extensive data volumes.

Custom Paging Function in Power Query

Power Query lets us develop a custom function to manage dynamic pagination, enabling data retrieval from the API on a per-page basis.

To comprehend this function in detail, we will reference an example API from the U.S. Treasury Reporting Rates of Exchange.

Building The GetPage Function

We will establish a custom function termed GetPage, which accepts a page number as an input and retrieves data for that specific page.

1. Launch Power Query Editor: Navigate to the “Home” tab and select “Transform Data” to open the Power Query Editor.

Open Power Query Editor Output

2. Create a Blank Query: In the “Home” tab, click on “New Source>”Blank Query”.

Create a Blank Query Output

3. Access Advanced Editor: Select “Advanced Editor” in the home tab to input custom code.

Open Advanced Editor Output

4. Input the code: Paste the following code in the tab that appears after selecting Advanced Editor, and name it GetPage.

(page as text) =>
  let
    Source = try
      Json.Document(
        Web.Contents(
          "https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/rates_of_exchange?page[number]="
            & page
            & "&fields=country_currency_desc,exchange_rate,record_date"
        )
      )
    otherwise
      [data = null, meta = null, links = null], 
    #"Converted to Table" = Table.FromRecords({Source}), 
    #"Removed Columns" = Table.RemoveColumns(#"Converted to Table", {"meta", "links"}), 
    #"Expanded data List" = Table.ExpandListColumn(#"Removed Columns", "data"), 
    #"Expanded data Records" = Table.ExpandRecordColumn(
      #"Expanded data List", 
      "data", 
      {"country_currency_desc", "exchange_rate", "record_date"}, 
      {"data.country_currency_desc", "data.exchange_rate", "data.record_date"}
    )
  in
    #"Expanded data Records"
Write the code Output

This GetPage function will retrieve data from the designated page number.

Methods for Retrieving Pages

Now that we have the GetPage function, we can access all pages by iterating through them.

Method 1: When Page Count is Known

The API provides the total page count. Power Query will loop through all pages to gather data.

Steps to gather all pages:

  • Open the Power Query Editor
  • Create a Blank Query
  • Paste the subsequent code in the Advanced Editor

Code for Method

“““html
1:

let
  Source = try
    Json.Document(
      Web.Contents(
        "https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/rates_of_exchange?fields=country_currency_desc,exchange_rate,record_date"
      )
    )
  otherwise
    [data = null, meta = null, links = null], 
  #"Page List" = 
    if Source = null or Source[meta] = null then
      {null}
    else
      {1 .. Source[meta][#"total-pages"]}, 
  #"Converted to Table" = Table.FromList(
    #"Page List", 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ), 
  #"Renamed Columns" = Table.RenameColumns(#"Converted to Table", {{"Column1", "PAGES"}}), 
  #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"PAGES", type text}}), 
  #"Invoked Custom Function" = Table.AddColumn(
    #"Changed Type", 
    "GetPage", 
    each if [PAGES] <> null then GetPage([PAGES]) else null
  ), 
  #"Expanded GetPage" = Table.ExpandTableColumn(
    #"Invoked Custom Function", 
    "GetPage", 
    {"data.country_currency_desc", "data.exchange_rate", "data.record_date"}, 
    {"GetPage.data.country_currency_desc", "GetPage.data.exchange_rate", "GetPage.data.record_date"}
  )
in
  #"Expanded GetPage"

Upon executing the query, you will receive a table containing all data from both tables.

Output:

Code for Method 1 Output

Method 2: Unknown Page Count; Known Entry Count

In cases where the total number of entries and entries per page are specified but the overall page count is not, we can compute the number of pages.

let
  Source = try
    Json.Document(
      Web.Contents(
        "https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/rates_of_exchange?fields=country_currency_desc,exchange_rate,record_date"
      )
    )
  otherwise
    [data = null, meta = null, links = null], 
  #"Page List" = 
    if Source = null or Source[meta] = null then
      {null}
    else
      {1 .. Number.RoundUp(Source[meta][#"total-count"] / Source[meta][#"count"])}, 
  #"Converted to Table" = Table.FromList(
    #"Page List", 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ), 
  #"Renamed Columns" = Table.RenameColumns(#"Converted to Table", {{"Column1", "PAGES"}}), 
  #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"PAGES", type text}}), 
  #"Invoked Custom Function" = Table.AddColumn(
    #"Changed Type", 
    "GetPage", 
    each if [PAGES] <> null then GetPage([PAGES]) else null
  ), 
  #"Expanded GetPage" = Table.ExpandTableColumn(
    #"Invoked Custom Function", 
    "GetPage", 
    {"data.country_currency_desc", "data.exchange_rate", "data.record_date"}, 
    {"GetPage.data.country_currency_desc", "GetPage.data.exchange_rate", "GetPage.data.record_date"}
  )
in
  #"Expanded GetPage"

Output:

Entry Count is Known Output

Method 3: Unknown Page Count and Entry Count

In this scenario, neither the total pages nor entries are specified, prompting us to utilize List. Generate to retrieve pages until no further data is available.

let
  Source = List.Generate(
    () => [PAGE = 1, RESULTS = GetPage("1")], 
    each Table.RowCount(
      Table.SelectRows(_[RESULTS], each not List.Contains(Record.ToList(_), null))
    )
      > 0, 
    each [PAGE = _[PAGE] + 1, RESULTS = GetPage(Number.ToText([PAGE] + 1))], 
    each _[[PAGE], [RESULTS]]
  ), 
  #"Converted to Table" = Table.FromList(
    Source, 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ), 
  #"Expanded Column" = Table.ExpandRecordColumn(
    #"Converted to Table", 
    "Column1", 
    {"PAGE", "RESULTS"}, 
    {"PAGES", "GetPage"}
  ), 
  #"Changed Type" = Table.TransformColumnTypes(#"Expanded Column", {{"PAGES", type text}}), 
  #"Expanded GetPage" = Table.ExpandTableColumn(
    #"Changed Type", 
    "GetPage", 
    {"data.country_currency_desc", "data.exchange_rate", "data.record_date"}, 
    {"GetPage.data.country_currency_desc", "GetPage.data.exchange_rate", "GetPage.data.record_date"}
  )
in
  #"Expanded GetPage"

Output:

Both Page Count and Entry Count are Unknown Output

Performance Evaluation

Method Performance Efficiency Best for
Method 1: Known Page Count Fastest efficiency. Directly utilizes total-pages metadata for iteration. Optimal for APIs that disclose total pages.
Method 2: Known Entry Count Performance is slightly less rapid than Method 1. Determines total pages using total-count and count. Ideal for APIs that report total count but not page count.
Method 3: Unknown Counts Performance is the slowest. Dynamically generates pages, verifying data existence. Best suited for APIs lacking total counts and pages.

Recommended Practices

  • Verify Existing Metadata: Review the API’s documentation to confirm the presence of metadata.
  • Manage API Limitations and Errors: Ensure error handling and retries are implemented in Power Query if your API imposes limitations. You can employ try… otherwise expressions to capture and address errors.
  • Optimize Query Performance: Request larger pages to lower the overall number of requests, thus minimizing API calls. Conduct performance tests when increasing data set sizes.
  • Cache API Responses: For APIs that don’t frequently change, consider storing data locally or in a database to reduce the number of requests.

Summary

Power
“““html

Query presents multiple approaches for managing paginated data, each method possessing unique advantages. Method 1 (Known Page Count) is the most effective when the total page count is specified, whereas Method 2 (Unknown Page Count with Known Entry Count) is suitable when only the total entry count is accessible. Method 3 (Both Counts are Unavailable) is the least efficient and should be employed when metadata is lacking.

To discover more regarding Power BI and its features, visit this Power BI Course and also review Power BI Interview Questions curated by industry professionals.

Managing Paginated REST API Data – FAQs

1. What does paginated data mean?

Paginated data refers to the practice of splitting larger datasets into smaller segments to alleviate congestion on the server or client side.

2. Why do APIs divide responses into pages?

APIs divide responses into pages to effectively handle substantial datasets and lessen server load by delivering data in smaller sections.

3. Is it possible for Power Query to automatically fetch all pages from an API?

Indeed, Power Query can automatically fetch all pages from an API by utilizing custom functions and iterative methods.

4. What is the ‘GetPage’ function used in Power Query?

The “GetPage” is a tailored Power Query function designed to retrieve individual pages of data.

5. What occurs if the total page count is not known?

If the total page count is not known, you can infer the number of pages based on the total entries available.

The article Pagination in REST APIs originally featured on Intellipaat Blog.

“`


Leave a Reply

Your email address will not be published. Required fields are marked *

Share This