Sometimes, scraping data from a primary webpage is only half the job. Each item in a table or list may contain a link leading to a detailed pageâlike a product info page, location details, or user profile.
With Lutra, you can perform a âtwo-stepâ (nested) scrape:
Scrape a list of items (and their links) from a primary source.
Follow each itemâs link to gather deeper, more detailed information.
This two-step approach is powerful for capturing more comprehensive data, and Lutraâs ability to âfollowâ links makes it far easier than typical scrapers.
Extract the Main Listing
Before you can do multi-layer extraction, youâll first need to extract the initial list of items and their links. If youâre new to this or have multiple pages, check out our article on Scraping Multi-Page Websites for best practices.
Identify the listing page and the data fields you need (e.g., name, price).
Instruct Lutra to extract each itemâs main info plus the link to its detail page.
Save the data in a spreadsheet (Google Sheets, Excel, CSV, etc.) and ensure you have a column with links.
Follow Each Link for More Details
Once you have a sheet of items and their detail-page links, you can have Lutra recursively open each link to gather additional data.
Use Your Existing Spreadsheet
Instruct Lutra to read the column that contains the detail-page links: "Use the Link column, and read the webpage there to get the item's information."
Define the Fields to Extract
For example: âScrape each itemâs extended description, user ratings, and shipping details.â
Output to a New Sheet (or the Same One)
You can create a fresh spreadsheet or add new columns to your existing one.
If you split them, add a unique identifier (e.g., a product ID) to link the two datasets together.
Combine or Cross-Reference Data
Same Sheet or Separate Sheets: Itâs often easiest to keep your main list in one sheet (listing info + link) and detailed info in another.
Linking Rows: If you choose separate sheets, ensure both contain a shared ID so you can cross-reference easily.
Tip: Cross-reference data by asking Lutra to use the same ID across different sheets:
âCreate a column titled âProduct IDâ in both the âProduct Listingsâ and âProduct Detailsâ sheets to link details to their respective items.â
Save as a Playbook
Create Playbook: After you confirm your two-step scraping approach works, click âCreate Playbookâ.
Name & Document: Give it a clear title (e.g., âTwo-Step Product Scrapeâ) and a short description so you or your teammates can quickly understand it.
Reuse or Automate: Run this Playbook anytime you need a fresh round of data. You can even schedule it if your source content updates regularly.
Tips & Best Practices
Test on a Few Items First: Make sure the detail links work and the data looks correct before scaling.
Be Explicit: If detail pages have tabs or nested sections, specify which parts you want extracted.