How to extract tool signals from job postings

Tool signals extracted from job postings are essential to identify leads for similar products, courses, or extensions. This is a hands-on guide on how to identify Salesforce users.

Posted by Jörg Rech on August 12, 2024 at 12:12:26

Reading Time: 14 min read

1. Introduction

In today's competitive landscape, understanding your customers' tool landscape can be a game-changer. Imagine knowing precisely which tools, software, or machinery your current or potential customers are using. With this insight, you could tailor your offerings to meet their exact needs, increasing your chances of closing deals and driving growth.

Companies often signal critical information about their activities and used tools through job postings. From the tools they utilize to the teams they are building, these signals can provide invaluable insights into their current and future needs.

In this article, we’ll guide you through the process of extracting company signals—specifically related to tools, machinery, software, and SaaS—from job postings. Leveraging Techmap's free data feed, we’ll demonstrate how to identify and analyze these signals to create actionable insights that can drive your sales strategy.

Outline of the Article

  1. Introduction
  2. The Power of Company Signals
  3. Extracting Data from Job Postings
  4. Identifying and Analyzing Tool Signals
  5. Practical Steps to Implement Extraction
  6. Conclusion

2. The Power of Company Signals

Company signals are subtle indicators that reveal a company's activities, strategies, internal structures, and future plans. These signals can be found in various sources, including job postings, press releases, financial reports, and social media. However, job postings offer a particularly rich vein of information. By identifying company signals within job postings, businesses can:

  • Tailor sales pitches to specific company needs.
  • Prioritize leads based on potential fit and interest.
  • Gain a competitive edge by understanding competitor strategies.

Job postings are a goldmine for such company signals. Here are the typical categories for these company signals:

  • Product-related signals provide information about a company's offerings, such as products or services. These signals can be leveraged to identify potential customers, opportunities to differentiate one's own products, or upselling and cross-selling opportunities
  • Operations-related signals provide information about how a company operates and is organized, both internally and externally. These signals can be utilized to identify opportunities for business deals and reveal potential areas for collaboration, co-marketing, or strategic alliances.
  • Work-related signals provide information about how a company's employees work, including the tools and technologies they use. These signals can be exploited to identify opportunities for offering products or services that replace, enhance, or complement the mentioned solutions.
  • Financial-related signals provide information about a company's financial status, such as investments, revenue, expenses, profits, and losses. These signals can be harnessed to identify opportunities for offering hiring, investment, or advisory services that align with the company's financial health and objectives.
  • Image-related signals provide information about how a company is perceived or aims to be perceived by the public. These signals can be leveraged to identify opportunities for offering branding, public relations, or marketing services that enhance the company's public image and align with its desired brand identity.
  • Growth-related signals provide information about a company's current performance and growth trajectory, as well as insights into its future ambitions and plans. These signals can be utilized to identify opportunities for offering business development, strategic planning, or consulting services that support the company's expansion and scalability.

More on company signals in general can be found in our article about A Classification of Company Signals in Sales Intelligence.

In this article, we focus on tool signals, a sub-class of work-related signals—referencing used tools, machinery, software, or SaaS solutions within job postings. Identifying these signals can reveal opportunities for offering complementary products, training, or more advanced solutions. For example, a company hiring for expertise in tools like Asana or Jira may signal a need for project management training or present an opportunity to introduce competitive solutions.

3. Extracting Data from Job Postings

3.1. Understanding Job Postings

Job postings are more than just advertisements for open positions—they are rich sources of information about a company’s operations, goals, and tool landscape. Typically, a job posting contains:

  • Company Information: Details about the industry, company size, locations, products, and culture.
  • Job Information: Descriptions of roles, responsibilities, work environment, and objectives.
  • Requirement Information: Desired experience, skills, certifications, and other qualifications with tools or the task for the job.
  • Compensation Information: Salary ranges, benefits, and incentives.
  • Application Information: Instructions for applying, deadlines, and contact details.

Additionally, job postings often reveal information about the departments or teams the new hire will join, the technology they will use, and potential career paths within the company.

In this article, we concentrate on extracting tool-related information from job postings—specifically from the company description and job requirement sections—to identify signals about the tools, engines, machinery, software, or SaaS solutions in use.

3.2. Origin of Job Postings

Companies disseminate their job postings across various platforms, including their corporate websites (e.g., careers.microsoft.com or amazon.jobs), HR tool career pages (e.g., Personio), job boards (e.g., Monster), aggregators (e.g., Indeed), and social media platforms (e.g., LinkedIn).

Our job posting data is aggregated from these sources, cleaned, normalized, and stored in data files on AWS S3. These datasets are available on the AWS Data Exchange (ADX) platform, which allows businesses to securely access and utilize third-party data for analytics, machine learning, and other data-driven applications.

For this guide, we’ll use the free Luxembourg data feed from Techmap on ADX. This feed provides access to historical job posting data since January 2020, with data for June 2024 containing ~15k job postings. The primary sources include LinkedIn, CareerJet, Indeed, Eures, and SmartRecruiters. For Luxembourg, the compressed data files typically range from 100KB to 1.5MB per day.

3.3. Data in Job Postings

Our data is exported daily in JSON format and stored in gzip-compressed files. Each uncompressed file line contains a job posting in JSON format, including the following fields:

Table 1: Data Dictionary for Techmap’s job postings (primary fields)

Field nameData TypeDescriptionExample
nameStringTitle of the JobCyber Security Project Manager
urlStringThe link to the job posting (often only valid for a month after dateCreated)https://lu.linkedin.com/jobs/view/cyber-security-project-manager-at-wds-global-limited-3949517399
dateScrapedDateThe day and time we found the job posting2024-06-15T18:57:58+0000
dateCreatedDateThe day and time the job was published2024-06-15T10:28:56+0000
locationJSONInformation on the location of the job{"orgAddress": { "addressLine": "Luxembourg, Luxembourg, Luxembourg", "countryCode": "lu", "country": "Luxembourg", "state": "Luxembourg", "city": "Luxembourg", "geoPoint": {"lat": 49.61, "lng": 6.129627 } } }
companyJSONInformation on the company as found on the job posting’s page{... "name": "State Street", "source": "linkedin_lu", "idInSource": "state-street", urls: {"linkedin_lu": "https://www.linkedin.com/company/state-street", …}, … }```
textStringPlain text version of the job description... We are looking for a skilled and experienced Cybersecurity Project Manager to lead customer’s cybersecurity initiatives ( location : Luxembourg). The ideal candidate will be a strategic thinker with a robust technical background , exceptional …

4. Identifying and Analyzing Tool Signals

With an understanding of how job posting data is structured, the next step is to prepare an extraction algorithm to identify tool signals. These signals are crucial for understanding the tools companies are currently using, which can inform your sales and product strategies.

In this guide, we use Linux shell commands to download and process the data files, and regular expressions (regex) to define identification rules in the Java programming language. For those using other programming languages like Python or JavaScript, you may need to adapt the regex slightly.

Even in Luxembourg’s multilingual environment, tool names are often consistent across languages. This allows us to identify tool signals without the need for translations or synonyms.

In this article, we want to identify companies using Salesforce in their tool landscape. Below is an example of how to define an identification regex—note that you may need to use the /…/i flag in Javascript or Python to ignore case sensitivity (Java uses the (?i) flag):

(?i)\b(Salesforce)\b

To test the regex, you can use the online tool Regex101 with the following text:

You will use Salesforce CRM database
Experience with Salesforce.com CRM application
the multifonds and salesforce software
Use CRM system (SalesForce)

5. Practical Steps to Implement Extraction

With the regex prepared, we can proceed to identify tool signals in job postings. The process involves downloading the data files from AWS S3, converting them to a more accessible format, and using shell commands to extract relevant signals.

Ensure that you have the necessary shell commands installed (aws, gzip, sort, uniq, and jq). If you don’t have an AWS account, follow the tutorial "Setting up AWS Data Exchange" from AWS to create one.

In order to use our job postings you can follow AWS’s tutorial to subscribe to AWS Data Exchange for Amazon S3 but use our free Luxembourg data feed on ADX instead of their test product.

5.1. Downloading the job postings

After subscribing to the data feed, you’ll receive an access point alias from AWS to our S3 data bucket with the job posting data from Luxembourg using the prefix lu/. The access point alias should end with -s3alias and we use <YOUR_BUCKET_ALIAS> as a placeholder in the following code.

The following commands demonstrate how to download the compressed data files:

List all files from June 2024

aws s3api list-objects-v2 \
  --request-payer requester \
  --bucket <YOUR_BUCKET_ALIAS> \
  --prefix 'lu/techmap_jobs_lu_2024-06-' | grep Key

Download one individual file

aws s3 cp \
  --request-payer requester \
  s3://<YOUR_BUCKET_ALIAS>/lu/techmap_jobs_lu_2024-06-01.jsonl.gz .

Download all files from June 2024 (requires 19 MB)

aws s3 sync \
    s3://<YOUR_BUCKET_ALIAS>/lu/ . \
    --request-payer requester \
    --exclude "*" \
    --include "techmap_jobs_lu_2024-06-*.jsonl.gz"

Decompress all files from June 2024 (results in 130 MB)

gzip -d *.gz

5.2. Filtering the job postings

With the job postings now converted into textual files in JSON Lines format, we can proceed to identify tool signals effectively. The next step involves looping through these files, applying our regular expressions (regex) to detect specific tool mentions, and generating a compact JSON output whenever a match is found. Each JSON entry captures essential details such as the job title, company name, location, and the relevant snippets (keywords in context), allowing for manual verification to ensure accuracy and distinguish between true and false positives.

This method of extracting and analyzing job posting data streamlines the identification of tool signals, offering businesses crucial insights into trends like geographic expansion and market entry. For a deeper understanding of job posting data extraction and the process of identifying tool signals, be sure to check out our comprehensive guide and other insightful articles on our blog.

#!/bin/bash

# Define the regex as an environment variable 
export REGEX='\b(Salesforce)\b' 

# Define and clear output file for the results 
export OUTPUT_FILE="company_signals.txt"
printf '' > "$OUTPUT_FILE"

# Loop over all files matching the pattern
for file in techmap_jobs_lu_2024-06-*.jsonl; do
    # Check if the file exists to avoid errors if no files match the pattern
    if [[ -e "$file" ]]; then
        # Decompress the file, filter JSON lines with jq, and extract fields
        cat "$file" | jq -r --arg regex ".{0,20}$REGEX.{0,20}" '
          select(
            . | to_entries[] | select(.value | type == "string" and test($regex; "i"))
          ) | {
            job_name: .name,
            job_url: .url,
            company_name: .company.name,
            location: (.location.orgAddress.addressLine // .location.orgAddress.city),
            matched_text: [
              . | to_entries[] | select(.value | type == "string" and test($regex; "i")) | .value | match($regex; "i").string
            ] | unique | map("..." + . + "...") | join(", ")
          }
        ' >> "$OUTPUT_FILE"
    else
        echo "No files matching the pattern found."
    fi
done

Running this code took just 64 seconds to process ~15k job postings from Luxembourg, distributed across 30 decompressed files. The output file generated is a streamlined dataset, capturing only the job postings where a tool signal was detected. Here's an excerpt of what the output looks like:

…
{
  "job_name": "Marketing Automation & Web Implementation Advisor",
  "job_url": "https://lu.linkedin.com/jobs/view/marketing-automation-web-implementation-advisor-at-bp-3942120716",
  "company_name": "bp",
  "location": "Bertrange, Luxembourg, Luxembourg",
  "matched_text": "...rience working with Salesforce (Pardot & Marketing..."
}
{
  "job_name": "New Business / AM Benelux",
  "job_url": "https://lu.linkedin.com/jobs/view/new-business-am-benelux-at-k%C3%B6rber-supply-chain-3942305015",
  "company_name": "Körber Supply Chain",
  "location": "Luxembourg",
  "matched_text": "...osals. You will use Salesforce CRM database to rec..."
}
{
  "job_name": "Digital Content Editor (English Speaker)",
  "job_url": "https://www.careerjet.lu/jobad/lu25ef2f6ad1990aa2cba6f4fb1a8f47f3",
  "company_name": "VML Luxembourg",
  "location": "Luxemburg, Luxemburg",
  "matched_text": "...cluding Adobe, SAP, Salesforce, HCL, Shopify, Site..."
}
…

To extract unique companies from job postings, regardless of the number of postings they placed, use the following code snippet:

cat company_signals.txt | grep company_name | sort | uniq

And get a result that looks like this:

  "company_name": "Amazon Web Services (AWS)",
  "company_name": "Amazon",
  "company_name": "Aylo",
  "company_name": "Azenta Life Sciences",
  "company_name": "BNP PARIBAS ASSET MANAGEMENT Luxembourg S.A.",
  "company_name": "Deloitte",
  "company_name": "INDR",
  "company_name": "Imerys",
  "company_name": "Infinity Quest",
  "company_name": "Intelsat",
  "company_name": "Jetfly",
  "company_name": "Johnson Controls International",
  "company_name": "Johnson Controls",
  "company_name": "Keter",
  "company_name": "Körber Supply Chain",
  "company_name": "LHH Luxembourg",
  "company_name": "Promon",
  "company_name": "Remote",
  "company_name": "VML Luxembourg",
  "company_name": "VML",
  "company_name": "Volkswagen Losch Financial Services",
  "company_name": "bp",
  "company_name": "myGwork - LGBTQ+ Business Community",
  "company_name": "skeeled",

Now, we can match the results with our own company data, such as customer or competitor information, and begin handling these new signals effectively. This process allows for more targeted and strategic decision-making, enhancing your competitive edge.

Depending on your internal processes, you may choose to store the data in a database rather than in JSON or CSV format for more efficient post-analysis activities. Alternatively, you can send the information directly to your sales department via email, ensuring timely and actionable insights.

5.3. Analyzing the Results

With the results in hand, we can now cross-reference them with our existing company data, including customer or competitor information, to effectively leverage these new signals. This approach enables more targeted and strategic decision-making, ultimately strengthening your competitive advantage.

Based on your internal workflows, you may opt to store this data in a database rather than in JSON or CSV format, allowing for more efficient post-analysis. Alternatively, you could directly forward the insights to your sales team via email, ensuring they have timely and actionable information at their fingertips.

6. Conclusion

By extracting and analyzing tool signals from job postings, you can gain a significant competitive advantage. These insights allow you to better understand your prospects' needs, tailor your sales strategies, and ultimately drive growth.

Leveraging Techmap’s data feed, your sales intelligence team can regularly monitor tool usage trends across various industries, enabling you to stay ahead of the competition. Whether you’re aiming to identify users of specific tools, new market opportunities, or refine your product offerings, these signals are a vital resource in your sales toolkit.

In our analysis of tool signals for "Salesforce" in Luxembourg, we identified 96 job postings mentioning "Salesforce" across 24 unique companies in June 2024. Further analysis for other months and regions, such as the USA, revealed a similar pattern. For instance, in June 2024, we found tool signals for "Salesforce" in postings from 4k companies across the USA, within a dataset of approximately 1 million job postings.

And all this without using machine learning and AI technologies that can further enhance this process by automating the extraction, increasing precision, and uncovering deeper insights.

Ready to unlock new insights and enhance your sales strategy? Explore Techmap's free Luxembourg data feed on AWS Data Exchange today and start turning data into actionable intelligence.

Build something great!

Unleash the full potential of our high-quality job postings to achieve your business objectives and gain a competitive edge in your industry!

Excellent Service

Our reliable services are tailored to meet your needs, helping you achieve your goals with confidence.

Scalable Solution

Our global coverage is designed to support your use-case, enabling you to easily add new markets.

Fast Set-up

Download JSON files from AWS S3 and pump them into your Databases or connect to our API in minutes.

Explore our Data!