Job Data Overview
1. Data Feed Overview
Techmap collects job postings from 127+ sources - ATS career pages, job boards, aggregators, and employment offices - and packages them as gzip-compressed JSON Lines files, one file per country per day. The Data Feeds product provides a rolling 3-month window updated every 24 hours. Historical Datasets extend that window back to January 2020. Both products are available through AWS Data Exchange and allow access to files in S3 buckets automatically. For small niches or lower volumes, the Job Postings APIs covers the same 3-month window.
Common use cases
- Daily ingestion pipeline: Subscribe to a country feed, pull the new file each morning from S3, parse the JSON Lines, and upsert records into your database or data warehouse.
- ML training data: Use full posting text and structured fields like position.*, salary.*, and orgTags.* as features for classification, NLP, or salary prediction models.
- Job board backfilling: Load the daily feed for one or more countries and insert new job postings into your job board database each hour or day without building your own crawler.
- Salary benchmarking: Filter records by salary, normalize amounts by currency and to annual figures, and build compensation distributions by role, seniority, and country.
- Company expansion tracking: Group postings by company, country, and location over time to detect when a company starts hiring in a new location - an early signal of geographic expansion.
- Hiring trend analysis: Load dataset files, group by country, dateCreated, and other keywords to build month-over-month hiring volume time series going back to January 2020.
1.1 Which Product is Right for You?
Four access options are available depending on your use case:
| Data Feeds | Historical Datasets | Job Postings API | RSS API | |
|---|---|---|---|---|
| Access | AWS Data Exchange | AWS Data Exchange | RapidAPI | RapidAPI |
| Lookback | Last 3 months | Jan. 2020 to today | Last 3 months | Last 3 months |
| Output format | JSON Lines (.json.gz) | JSON Lines (.json.gz) | JSON (default), CSV, RSS, ATOM, XML, Parquet | RSS 2.0 |
| Billing | Per country subscription | Per country purchase | Per request (10 jobs each) | Per request (10-1000 jobs each) |
| Best for | Bulk ETL, recurring ingestion pipelines | Historical analysis, ML training data, one-off research | Real-time queries, filtering, analytics, job board feeds | Hosted job boards consuming RSS (jBoard, NiceBoard, ...) |
| Field reference | This page | This page | Job API Overview | Job API Overview |
1.2 How to Subscribe
Data Feeds and Historical Datasets are available through AWS Data Exchange. Payment is securely handled by AWS and billing is consolidated into your existing AWS account - no separate vendor contract or credentials needed. Once subscribed, new daily files appear automatically in an S3 bucket alias in your own AWS account, ready to pull into any S3-compatible pipeline.
- Browse the listings: Visit the Techmap seller profile on AWS Marketplace and choose the country or region you need.
- Subscribe: Choose a Job Data Feed (last 3 months) or purchase a Historical Dataset (Jan. 2020 to today) for a country. For multi-country bundles contact us.
- Pull files from S3: After subscription, AWS Data Exchange grants you read access to the S3 bucket where the daily files can be accessed and new daily files appear automatically. Use the AWS CLI, boto3, or any S3-compatible tool to pull them into your pipeline.
Test for free: The Luxembourg job dataset is available at no cost and contains approx. 250k postings starting January 2020, growing by roughly 10k records per month.
2. File Format & Delivery
The data files are gzip-compressed JSON Lines - one file per country per day. Each line in the decompressed file is a self-contained JSON object representing one job posting. After decompression the JSON can be easily used in Python, Java, Spark, DuckDB, and BigQuery external tables.
2.1 File Structure
Files are named by country code and date: techmap-jobs_{countryCode}_{date}.json.gz. For example, the US feed for 2026-06-06 would be techmap-jobs_us_2026-06-06.json.gz. Country codes follow the ISO 3166-1 alpha-2 standard used throughout the dataset (Exceptions: use "uk" instead of "gb", "gr" instead of "el", and "##" for remote).
Each JSON object in the file maps to the export schema documented in Section 3. Nested fields use dot-notation keys in the schema reference (e.g., salary.*, company.*) but are stored as actual nested objects in the JSON.
salary.* (~23%) and contact.* (~31%) are sparse.2.2 Code Examples
Read and iterate a JSON Lines file (Python)
import gzip
import json
filename = "techmap-jobs_us_2026-06-06.json.gz"
with gzip.open(filename, "rt", encoding="utf-8") as fh:
for line in fh:
job = json.loads(line)
title = job.get("name", "")
company = (job.get("company") or {}).get("name", "")
country = job.get("sourceCC", "")
date_created = job.get("dateCreated", "")
text = job.get("text", "")
print(title, company, country, date_created)
Build a structured DataFrame from a feed file (Python + pandas)
import gzip
import json
import pandas as pd
def load_feed(path: str) -> pd.DataFrame:
records = []
with gzip.open(path, "rt", encoding="utf-8") as fh:
for line in fh:
job = json.loads(line)
company = job.get("company") or {}
location = (job.get("location") or {}).get("orgAddress") or {}
position = job.get("position") or {}
salary = job.get("salary") or {}
records.append({
"id": job.get("idInSource"),
"source": job.get("source"),
"country": job.get("sourceCC"),
"date_created": job.get("dateCreated"),
"title": job.get("name"),
"company_name": company.get("name"),
"city": location.get("city"),
"state": location.get("state"),
"work_type": position.get("workType"),
"career_level": position.get("careerLevel"),
"contract_type": position.get("contractType"),
"salary_amount": salary.get("amount"),
"salary_currency": salary.get("currency"),
"salary_period": salary.get("period"),
"text_length": len(job.get("text", "")),
"url": job.get("url"),
})
return pd.DataFrame(records)
df = load_feed("techmap-jobs_us_2026-06-06.json.gz")
print(df.shape)
print(df.dtypes)
Load multiple daily files and aggregate (Python + DuckDB)
import gzip
import json
import duckdb
# Write JSONL to a temp file, then query with DuckDB
import tempfile, os
def extract_jsonl(src_gz: str, dst_jsonl: str):
with gzip.open(src_gz, "rt", encoding="utf-8") as fh, open(dst_jsonl, "w") as out:
for line in fh:
out.write(line)
extract_jsonl("techmap-jobs_us_2026-06-06.json.gz", "/tmp/jobs.jsonl")
con = duckdb.connect()
result = con.execute("""
SELECT
json_extract_string(j, '$.sourceCC') AS country,
json_extract_string(j, '$.dateCreated')[:7] AS month,
COUNT(*) AS postings
FROM read_json_auto('/tmp/jobs.jsonl', format='newline_delimited') AS t(j)
GROUP BY 1, 2
ORDER BY 2, 3 DESC
""").fetchdf()
print(result.head(20))
3. Data Dictionary
The export schema below covers the fields available in the JSON Lines files delivered via Data Feeds and Historical Datasets. Fields marked with .* are nested objects - expand them with your JSON parser. The % Populated column reflects fill rates measured across a representative data slice and might vary from country to country and time to time.
| Field | Type | % Populated | Description |
|---|---|---|---|
source | String | 100% | The origin of the job posting with a countrycode such as 'monster_us'. |
sourceCC | String (ISO3166) | 100% | The countrycode (ISO3166) the job posting is located, e.g., 'us'. |
idInSource | String | 100% | The ID used in the source the job posting originated from. |
name | String | 100% | The title of the job posting. |
url | String (URL) | 100% | The link where we found the job posting. |
text | String | 100% | The text of the job posting extracted from the html field. |
html | String (HTML) | 100% | The original HTML used on the Website or stored in the JSON of the job posting page. |
json | Object | 100% | The original JSON objects found in the job posting page including schema.org job posting data if available. |
referenceID | String | 30% | An ID stated on the job postings page given by the original company for internal use. |
position.* | Object | 100% | Data concerning the job position such as name, contract type (e.g., Permanent), work type (e.g., full-time), or career level (e.g., Junior). |
salary.* | Object | 23% | Data concerning the job salary such as amount and period (e.g., 'Weekly'). |
contact.* | Object | 31% | Data about contact information such as contact name, email, phone, or physical address. |
orgTags.* | Object | 100% | Tags found on the job posting page (e.g., skills such as 'Java'), in the JSON (e.g., benefits) or from the browsing hierarchy (e.g., 'IT Jobs'). |
location.orgAddress.* | Object | 100% | Data about the location of the job as originally stated on the job posting or its JSON. |
company.* | Object | 100% | Data about the company of the job as originally stated on the job posting or its JSON. |
locale | String | 28% | The language the job posting is written in as a language locale (e.g., 'en_US'). |
dateCreated | Date | 100% | The date and time the job posting was created by the company (as stated on the page or its JSON). |
dateScraped | Date | 100% | The date and time the job posting was loaded and analyzed. |
3.1 Data Samples
The fastest way to validate the schema against your system is to load a small sample before subscribing to a full country feed.
- Luxembourg job dataset (free) - approx. 250k postings starting January 2020, growing ~10k per month. Good for end-to-end pipeline testing.
- Sample file from April 2023 - 1.4k records, 3.3 MB gzip, covering 125+ sources and 250 countries.
Kaggle datasets for exploratory analysis:
- US Job Postings - May 2023 (33k records, 805 MB gzip)
- Ireland Job Postings - October 2022 (37k records, 101 MB gzip)
- Ireland Job Postings - October 2021 (25k records, 56 MB gzip)
- Ireland Job Postings - October 2020 (30k records, 58 MB gzip)
- International Job Postings - September 2021 (3.4M records, 8 GB gzip)
Need a specific country, date range, or field subset? Contact the data team and we'll provide a targeted sample.
4. References
- Job API Overview - API endpoints, query parameters, response schema, and pagination guide
- Portal Explorer - all 127+ sources with country coverage
- Country Coverage - available country codes and feed availability
- Techmap on AWS Marketplace - subscribe to Data Feeds or purchase Historical Datasets
- JSON Lines specification - the file format used in all deliveries
- Techmap on Kaggle - free sample datasets for exploratory analysis
- FAQ - common questions about data quality, update frequency, and coverage
Questions? Contact us if something is unclear or if you want sample data.