Developer Resources

Developer Resources

1. Downloading Data from AWS Data Exchange (S3 bucket access)

We have listed our many of our datasets on AWS Data Exchange (ADX). You can find the list of Techmap's data products here. To get a first impression and test the workflow you can access Techmap's Luxembourg dataset for free.

1.1 List all files in the AWS S3 Bucket for a country

To list the files you're subscribed to you can use the list-objects-v2 command. The BUCKET_ALIAS can be found in the AWS Data Exchange "Entitled Data" section in the dataset section.

aws s3api list-objects-v2 --request-payer requester \
  --bucket BUCKET_ALIAS \
  --prefix COUNTRY_PREFIX

1.2 Download a file from the AWS S3 Bucket for a country

To download files you're subscribed to you can use the get-object command. For example, to download the file "techmap_jobs_lu_2023-07-01.jsonl.gz" from the Luxembourg dataset you can execute:

aws s3api get-object --request-payer requester \
  --bucket BUCKET_ALIAS \
  --key lu/techmap_jobs_lu_2023-07-01.jsonl.gz \
  techmap_jobs_lu_2023-07-01.jsonl.gz

2. Downloading Data from AWS S3 (via Proprietary Contracts)

To list and download ordered files you can use the following commands. Please note that you only have access to the ordered time range and countries.

2.1 List all countries in our AWS S3 Bucket

aws s3 ls --profile YOUR_CREDENTIALS_FOR_TECHMAP \
  --summarize s3://BUCKET_NAME

2.2 List all files in UK directory

aws s3 ls --profile YOUR_CREDENTIALS_FOR_TECHMAP \
  --summarize s3://BUCKET_NAME/uk/

2.3 Download one file for the USA from May 4th 2023

aws s3 cp --profile YOUR_CREDENTIALS_FOR_TECHMAP \
  s3://BUCKET_NAME/us/techmap_jobs_us_2023-05-04.jsonl.gz .

2.4 Download all files for the USA from April 2023

aws s3 sync --profile YOUR_CREDENTIALS_FOR_TECHMAP \
  s3://BUCKET_NAME/us/ . \
  --exclude "*" --include "techmap_jobs_us_2023-04-*"

Parsing Data Files

In JavaScript / TypeScript you can easily use JSON.parse() to parse all lines in the JSON files within a directory.

const fs = require('fs');
const path = require('path');
const readline = require('readline');

const directoryPath = './techmap/files/...';

fs.readdir(directoryPath, function (err, files) {
  if (err) {
    console.log('Unable to scan directory: ' + err);
    return;
  }
  // Iterate through each file in the directory
  files.forEach(function (file) {
    // Check if file is a JSON file
    if (path.extname(file) === '.json') {
      // Read the file
      const readStream = fs.createReadStream(directoryPath + file);
      const rl = readline.createInterface({
        input: readStream,
        crlfDelay: Infinity
      });
      rl.on('line', function (line) {
        try {
          // Parse the JSON data in the line
          const jsonData = JSON.parse(line);
          // TODO: do something with the JSON - e.g., store in your own DB
          console.log(jsonData);
        } catch (err) {
          console.log('Unable to parse JSON data in file ' + file + ' on line: ' + line + ': ' + err);
        }
      });
    }
  });
});

With the Java programming language you can use Java's jasonx library to parse all lines in the JSON files within a directory.

import java.io.IOException;
import java.nio.file.*;
import java.util.List;
import java.util.stream.Collectors;
import javax.json.*;

String directoryPath = "./techmap/files/...";

// Parse all JSON files in the directory
List<JsonObject> jsonObjects = Files.list(Path.of(directoryPath))
  .filter(path -> path.toString().endsWith(".json"))
  .map(path -> {
    try (JsonReader reader = Json.createReader(Files.newBufferedReader(path))) {
      return reader.readObject();
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  })
  .collect(Collectors.toList());

// Do something with the list of JSON objects
for (JsonObject jsonObject : jsonObjects) {
    // Process each JSON object
    // TODO: do something with the JSON - e.g., store in your own DB
    System.out.println(jsonString.toString());
}

For other Programming languages see the libraries section of the JSON Homepage. And remember that we have sample data you can test with.

Need Help? If you have any additional questions about working with our Job Datasets or Data Streams, please don't hesitate to contact us. We're here to help!