Job Data Overview

1. Data Feed Overview

Techmap collects job postings from 127+ sources - ATS career pages, job boards, aggregators, and employment offices - and packages them as gzip-compressed JSON Lines files, one file per country per day. The Data Feeds product provides a rolling 3-month window updated every 24 hours. Historical Datasets extend that window back to January 2020. Both products are available through AWS Data Exchange and allow access to files in S3 buckets automatically. For small niches or lower volumes, the Job Postings APIs covers the same 3-month window.

Common use cases

  • Daily ingestion pipeline: Subscribe to a country feed, pull the new file each morning from S3, parse the JSON Lines, and upsert records into your database or data warehouse.
  • ML training data: Use full posting text and structured fields like position.*, salary.*, and orgTags.* as features for classification, NLP, or salary prediction models.
  • Job board backfilling: Load the daily feed for one or more countries and insert new job postings into your job board database each hour or day without building your own crawler.
  • Salary benchmarking: Filter records by salary, normalize amounts by currency and to annual figures, and build compensation distributions by role, seniority, and country.
  • Company expansion tracking: Group postings by company, country, and location over time to detect when a company starts hiring in a new location - an early signal of geographic expansion.
  • Hiring trend analysis: Load dataset files, group by country, dateCreated, and other keywords to build month-over-month hiring volume time series going back to January 2020.

1.1 Which Product is Right for You?

Four access options are available depending on your use case:

Data FeedsHistorical DatasetsJob Postings APIRSS API
AccessAWS Data ExchangeAWS Data ExchangeRapidAPIRapidAPI
LookbackLast 3 monthsJan. 2020 to todayLast 3 monthsLast 3 months
Output formatJSON Lines (.json.gz)JSON Lines (.json.gz)JSON (default), CSV, RSS, ATOM, XML, ParquetRSS 2.0
BillingPer country subscriptionPer country purchasePer request (10 jobs each)Per request (10-1000 jobs each)
Best forBulk ETL, recurring ingestion pipelinesHistorical analysis, ML training data, one-off researchReal-time queries, filtering, analytics, job board feedsHosted job boards consuming RSS (jBoard, NiceBoard, ...)
Field referenceThis pageThis pageJob API OverviewJob API Overview

1.2 How to Subscribe

Data Feeds and Historical Datasets are available through AWS Data Exchange. Payment is securely handled by AWS and billing is consolidated into your existing AWS account - no separate vendor contract or credentials needed. Once subscribed, new daily files appear automatically in an S3 bucket alias in your own AWS account, ready to pull into any S3-compatible pipeline.

  1. Browse the listings: Visit the Techmap seller profile on AWS Marketplace and choose the country or region you need.
  2. Subscribe: Choose a Job Data Feed (last 3 months) or purchase a Historical Dataset (Jan. 2020 to today) for a country. For multi-country bundles contact us.
  3. Pull files from S3: After subscription, AWS Data Exchange grants you read access to the S3 bucket where the daily files can be accessed and new daily files appear automatically. Use the AWS CLI, boto3, or any S3-compatible tool to pull them into your pipeline.

Test for free: The Luxembourg job dataset is available at no cost and contains approx. 250k postings starting January 2020, growing by roughly 10k records per month.

2. File Format & Delivery

The data files are gzip-compressed JSON Lines - one file per country per day. Each line in the decompressed file is a self-contained JSON object representing one job posting. After decompression the JSON can be easily used in Python, Java, Spark, DuckDB, and BigQuery external tables.

2.1 File Structure

Files are named by country code and date: techmap-jobs_{countryCode}_{date}.json.gz. For example, the US feed for 2026-06-06 would be techmap-jobs_us_2026-06-06.json.gz. Country codes follow the ISO 3166-1 alpha-2 standard used throughout the dataset (Exceptions: use "uk" instead of "gb", "gr" instead of "el", and "##" for remote).

Each JSON object in the file maps to the export schema documented in Section 3. Nested fields use dot-notation keys in the schema reference (e.g., salary.*, company.*) but are stored as actual nested objects in the JSON.

Note on field completeness: Not all fields are populated for every posting. The % Populated column in the data dictionary shows how frequently each field appears. Fields like salary.* (~23%) and contact.* (~31%) are sparse.

2.2 Code Examples

Read and iterate a JSON Lines file (Python)

import gzip
import json

filename = "techmap-jobs_us_2026-06-06.json.gz"

with gzip.open(filename, "rt", encoding="utf-8") as fh:
    for line in fh:
        job = json.loads(line)
        title = job.get("name", "")
        company = (job.get("company") or {}).get("name", "")
        country = job.get("sourceCC", "")
        date_created = job.get("dateCreated", "")
        text = job.get("text", "")
        print(title, company, country, date_created)

Build a structured DataFrame from a feed file (Python + pandas)

import gzip
import json
import pandas as pd

def load_feed(path: str) -> pd.DataFrame:
    records = []
    with gzip.open(path, "rt", encoding="utf-8") as fh:
        for line in fh:
            job = json.loads(line)
            company = job.get("company") or {}
            location = (job.get("location") or {}).get("orgAddress") or {}
            position = job.get("position") or {}
            salary = job.get("salary") or {}
            records.append({
                "id": job.get("idInSource"),
                "source": job.get("source"),
                "country": job.get("sourceCC"),
                "date_created": job.get("dateCreated"),
                "title": job.get("name"),
                "company_name": company.get("name"),
                "city": location.get("city"),
                "state": location.get("state"),
                "work_type": position.get("workType"),
                "career_level": position.get("careerLevel"),
                "contract_type": position.get("contractType"),
                "salary_amount": salary.get("amount"),
                "salary_currency": salary.get("currency"),
                "salary_period": salary.get("period"),
                "text_length": len(job.get("text", "")),
                "url": job.get("url"),
            })
    return pd.DataFrame(records)

df = load_feed("techmap-jobs_us_2026-06-06.json.gz")
print(df.shape)
print(df.dtypes)

Load multiple daily files and aggregate (Python + DuckDB)

import gzip
import json
import duckdb

# Write JSONL to a temp file, then query with DuckDB
import tempfile, os

def extract_jsonl(src_gz: str, dst_jsonl: str):
    with gzip.open(src_gz, "rt", encoding="utf-8") as fh, open(dst_jsonl, "w") as out:
        for line in fh:
            out.write(line)

extract_jsonl("techmap-jobs_us_2026-06-06.json.gz", "/tmp/jobs.jsonl")

con = duckdb.connect()
result = con.execute("""
    SELECT
        json_extract_string(j, '$.sourceCC') AS country,
        json_extract_string(j, '$.dateCreated')[:7] AS month,
        COUNT(*) AS postings
    FROM read_json_auto('/tmp/jobs.jsonl', format='newline_delimited') AS t(j)
    GROUP BY 1, 2
    ORDER BY 2, 3 DESC
""").fetchdf()
print(result.head(20))

3. Data Dictionary

The export schema below covers the fields available in the JSON Lines files delivered via Data Feeds and Historical Datasets. Fields marked with .* are nested objects - expand them with your JSON parser. The % Populated column reflects fill rates measured across a representative data slice and might vary from country to country and time to time.

FieldType% PopulatedDescription
sourceString100%The origin of the job posting with a countrycode such as 'monster_us'.
sourceCCString (ISO3166)100%The countrycode (ISO3166) the job posting is located, e.g., 'us'.
idInSourceString100%The ID used in the source the job posting originated from.
nameString100%The title of the job posting.
urlString (URL)100%The link where we found the job posting.
textString100%The text of the job posting extracted from the html field.
htmlString (HTML)100%The original HTML used on the Website or stored in the JSON of the job posting page.
jsonObject100%The original JSON objects found in the job posting page including schema.org job posting data if available.
referenceIDString30%An ID stated on the job postings page given by the original company for internal use.
position.*Object100%Data concerning the job position such as name, contract type (e.g., Permanent), work type (e.g., full-time), or career level (e.g., Junior).
salary.*Object23%Data concerning the job salary such as amount and period (e.g., 'Weekly').
contact.*Object31%Data about contact information such as contact name, email, phone, or physical address.
orgTags.*Object100%Tags found on the job posting page (e.g., skills such as 'Java'), in the JSON (e.g., benefits) or from the browsing hierarchy (e.g., 'IT Jobs').
location.orgAddress.*Object100%Data about the location of the job as originally stated on the job posting or its JSON.
company.*Object100%Data about the company of the job as originally stated on the job posting or its JSON.
localeString28%The language the job posting is written in as a language locale (e.g., 'en_US').
dateCreatedDate100%The date and time the job posting was created by the company (as stated on the page or its JSON).
dateScrapedDate100%The date and time the job posting was loaded and analyzed.

3.1 Data Samples

The fastest way to validate the schema against your system is to load a small sample before subscribing to a full country feed.

Kaggle datasets for exploratory analysis:

Need a specific country, date range, or field subset? Contact the data team and we'll provide a targeted sample.

4. References

Questions? Contact us if something is unclear or if you want sample data.