🌐 Scraping lists and tables from a website

AI Web Scraping

Lutra uses AI for data extraction, making it far more versatile than traditional web scraping tools. Traditional web scrapers rely on fixed rules and patterns to collect data from websites, which can break if the site's structure changes. In contrast, Lutra's AI can understand and adapt to various website layouts and extract meaningful information from unstructured data. This means Lutra can handle a wide range of websites and data formats

Common use cases

Lead Generation and Prospecting: Extract contact information like emails, phone numbers, and company names from directories (e.g., Conference Speakers, Exhibition Lists).
Competitive Analysis: Monitor competitor pricing, product offerings, and updates (e.g., Amazon, G2, Capterra).
E-commerce Insights: Gather product details, reviews, and pricing trends (e.g., eBay, Walmart).
Market Research: Scrape reports, trends, and consumer feedback from forums and blogs (e.g., Reddit, Trustpilot, G2, Capterra).
Content Aggregation: Compile articles, blog posts, or curated lists (e.g., Medium, WordPress blogs).
Real Estate Data Collection: Extract property listings, prices, and agent information (e.g., Zillow, Redfin, Realtor).
Academic Research: Collect tables, charts, and citations from government or academic sites (e.g., PubMed, ResearchGate).
News Tracking and Alerts: Monitor breaking news and specific topics (e.g., CNN).

Data formats

Lutra provides flexible output formats. You can ask it to produce data directly into Google sheets, Excel files, Documents, and even put them directly into a Drive folder. You can also ask it to produce structured outputs (JSON) for integrated use, contact us for more information on custom integrations with Lutra.

How to use Lutra

Start a new chat session with Lutra.
Tell Lutra one or two examples of websites you'd like to work with, and what data you want from them. For example,
1. "Please extract the top articles, their links, points, and number of comments from https://news.ycombinator.com/".
2. You can include more websites, detailed instructions on the structures. You do not need to tell it the HTML structure, Lutra will process the website to be in a format that is adapted for AI extraction.
Refine the extraction with Lutra
1. Ask Lutra to produce the results in a Google spreadsheet to easily see what data is extracted.
2. Provide feedback to Lutra on the extraction quality. You can tell it data it missed, how you want to format the output differently, and any changes you want to make.
Save the process as a Playbook after you have verified the extraction is working well.
Scale up the extraction to more websites by asking Lutra to process a list of websites (you can provide this via a spreadsheet). You can also automate the process by scheduling the playbook.

Tips for best results

Start Small: Begin by testing Lutra on one or two web pages to get familiar with how it interacts with different website structures.
Provide Feedback: Lutra uses AI to adapt and improve based on your feedback. If the initial results don’t meet your expectations, let Lutra know. It can fine-tune its approach, improving both the data it extracts and how it structures the output.
1. Provide feedback directly, for example:
  1. "You missed the data X, Y, Z, please try again."
  2. "The format of the item ids are XYZ-123, make sure you get that."
Iterate: Run small-scale extractions to refine the scraping process until you are fully satisfied with the results. You can refine the fields, format, data types used. For technical users, you can look at the underlying data schema implementation Lutra has decided to use.
Scale: Once you’re happy with the results, you can scale up the process by asking Lutra to run on more pages.
1. You can provide it a set of URLs to process via a spreadsheet, or ask it to go through URLs in a specific pattern.
2. If you understand the the URL format (e.g., sometimes urls have parameters like "page=1" at the end of them), Lutra can vary that parameter to scale up the process. Tell it directly the pattern to use, provide an example, and it will do so.
Schedule or Automate: After fine-tuning the setup, schedule or automate the process to ensure the data remains up-to-date without requiring constant manual input.

Frequently asked questions

What websites can Lutra scrape?
Lutra works best with website that have static pages (possibly rendered with javascript). It can handle most websites including paginated content. If you find that Lutra is not loading the data correctly, you can ask it to "wait a few seconds for the content to load" or provide a specific css element that it should look for.
Is it legal to scrape websites?
Scraping legality varies by jurisdiction and website. Users are responsible for ensuring compliance with local laws and website terms of service.
How much data can Lutra handle?
Lutra is scalable and supports everything from small-scale projects to enterprise-level extractions. However, it's always a good idea to verify with small tests before scaling.
Can I scrape data across multiple pages?
Yes, Lutra supports pagination, allowing it to scrape data across multiple pages seamlessly. This works best when the pagination structure is in the URL - tell Lutra the format, and it will do its best.

How do I integrate Lutra with my existing tools?
Lutra offers direct integrations with platforms like Google Sheets, CRMs, and analytics tools. Data exports in formats like CSV and JSON make it easy to import into other applications. You can also produce Webhooks with Lutra for integrations, contact support to discuss custom integrations.

Starter playbooks

Extract a list of names from a website.
Extract a list of companies from a website.