Skip to main content
All CollectionsWays to use LutraAI Web Scraping & Extraction
🕸️ Scraping Multi-Page Websites with Lutra

🕸️ Scraping Multi-Page Websites with Lutra

Lutra can get content across multiple pages when each page is accessible through a URL parameter. It can efficiently process all these pages by recognizing the URL pattern.

Lutra avatar
Written by Lutra
Updated over 2 months ago

When data is spread over multiple pages—like product listings, event schedules, job postings, or forum threads—you want to capture everything rather than just Page 1. Lutra’s ability to handle multi-page scraping saves time and ensures you end up with a complete data set for deeper analysis.


Identify the Page Pattern

  1. Navigate to the site and locate at least two different page URLs.

  2. Compare the URLs to see how the page parameter changes. For example:

    • http://example.com/products?page=1

    • http://example.com/products?page=2

  3. Verify continuity: confirm that simply incrementing the page number (e.g., ?page=3) continues to reveal new data.

Example patterns:

  • ?page=1, ?page=2

  • &pg=1, &pg=2

  • /page/1/, /page/2/


Start a New Chat with Lutra

  1. Open Lutra and create a new chat session.

  2. Paste the URLs for the first few page (e.g., page 1 and 2).

  3. Explain what data you want to scrape (lists, tables, product info, etc.).

Example instruction:

“Please scrape the list of products, their prices, and links from these two pages:

http://example.com/products?page=1

http://example.com/products?page=2

Notice the page parameter. I want you to scrape pages 1 through 5 using that same format.”

Tip: If Lutra’s extracted data looks off, instruct it with feedback:

“You missed the product ASIN code. Please be sure to extract that, try again.”


Test on a Small Number of Pages

  • Start small by having Lutra scrape just Pages 1–3.

  • Check the results in the Chat view or in your chosen output—this is a great time to ask Lutra to place the data in a Google Sheet or Excel file.

  • Provide feedback: if Lutra misses any fields or if you need more or less data, let Lutra know and it can refine its approach.

  • Refine until the output matches your expectations.

Tip: You can say, “Save the scraped data in a new Google Sheet with columns for Product Name, Price, and Link.” This lets you easily review the data before scraping more pages.


Scale Up

  1. Define the page range: once Lutra succeeds on a few pages, specify how far you want to go—for example, pages 1 through 20.

  2. Ensure consistency: confirm the website continues to follow the same URL pattern for all pages.

  3. Run the broader scrape: Lutra will systematically request each page according to the pattern you provided.

  4. Export your data: if you haven’t already, you can now have Lutra write all the scraped data into one CSV, Excel file, or Google Sheet for final checks.

Tip: If you’re not sure how many pages exist, you can estimate a maximum, or ask Lutra to stop scraping once it detects no new data.


Save as a Playbook

  1. Click “Create Playbook” in Lutra after you’re satisfied with the multi-page scraping approach.

  2. Reuse or schedule the Playbook to automate multi-page scraping in the future, without re-specifying the same instructions.


Organizing the Data in a Spreadsheet

Once Lutra finishes scraping, you can do one of the following:

  1. Save to a New Google Sheet

    • In the same chat, simply say:

      “Please put the final data into a Google Sheet with columns: Product Name, Price, Link.”

    • Lutra will create a new sheet or update an existing one for easy reviewing.

  2. Download an Excel File

    • Ask Lutra to export the scraped results as an Excel file. It can then provide you with a downloadable link or attachment.

    • You can open the file in Excel (or any compatible spreadsheet program) to quickly filter, sort, or format the data.

  3. Use CSV or JSON

    • If you prefer a more developer-friendly format, Lutra can generate a CSV or JSON file that you can import into databases or other tools.

Reviewing your scraped data in spreadsheet format makes it easy to spot-check for completeness and accuracy. If you notice anything missing or incorrect, provide feedback to Lutra directly in the chat, and it can adjust how it’s scraping or how it’s formatting the output.

Did this answer your question?