How to scrape job postings with Hystruct

by Alex, Founder

Scraping job postings is a common use case for Hystruct, and it's a great example due to the diverse types of data involved, such as the job title, apply URL, posting date, and salary information. In this blog post, I'll walk you through how to set up a job scraping workflow using Hystruct. (Or as you will have seen, you can watch the video above for a visual guide!)

Step 1: Identify the Data to Scrape

Let's start by identifying the data we want to scrape. We've got a job board website with multiple job postings, and we want to extract some information from each posting. We'll define the data in the next step when setting up the schema in Hystruct.

Step 2: Set Up the Schema in Hystruct

Next, we head over to the Hystruct dashboard. When you first create a Hystruct account, a “job post” schema is automatically added to help you get started. This schema includes commonly used fields, but you can customize it to fit your needs. Or if you want, you can create a new schema from scratch.

Step 3: Create a New Workflow

With our schema ready, we can now create a new workflow. Here's how:

  1. Go to the “Workflows” section and click on “Create Workflow.”
  2. Select the “job posting” schema.
  3. Enter the URL of the job board we want to scrape. (this should be the main page where all job posts are listed, rather than a specific job post page)
  4. Enable “loops” since we're scraping more than one job post at a time. This tells the AI to iteratively scrape all job posts from the URL.
  5. We'll also set the "noun" to "job posting". This is also used to help the AI understand the context of the data it's scraping.
  6. Click “Create Workflow.”

Step 4: Run the Workflow

Now, let's run the workflow by clicking “Run job now.” The Hystruct AI will start by scraping the first page and looking for all job posts. It will then scrape those pages and extract all the information defined in our schema. The time it takes to complete depends on the size of the workflow and the number of pages. If our job board only has a few pages, it should be pretty quick!

Step 5: Check and Download the Data

Once the scraping is finished, refresh the page to see the progress. You can view the data directly on the Hystruct dashboard or download it in JSON or CSV format. Additionally, you can use our API to access the information.

Conclusion

Scraping structured data with Hystruct is straightforward and efficient. If you have any questions about using Hystruct, feel free to reach out to the team by sending an email to team@hystruct.com.

Thanks for following along!