Developer Resources
Developer Resources
1. Downloading Data from AWS Data Exchange (S3 bucket access)
We have listed our many of our datasets on AWS Data Exchange (ADX). You can find the list of Techmap's data products here. To get a first impression and test the workflow you can access Techmap's Luxembourg dataset for free.
1.1 List all files in the AWS S3 Bucket for a country
To list the files you're subscribed to you can use the list-objects-v2
command. The BUCKET_ALIAS can be found in the AWS Data Exchange "Entitled Data" section in the dataset section.
aws s3api list-objects-v2 --request-payer requester \
--bucket BUCKET_ALIAS \
--prefix COUNTRY_PREFIX
1.2 Download a file from the AWS S3 Bucket for a country
To download files you're subscribed to you can use the get-object
command. For example, to download the file "techmap_jobs_lu_2023-07-01.jsonl.gz" from the Luxembourg dataset you can execute:
aws s3api get-object --request-payer requester \
--bucket BUCKET_ALIAS \
--key lu/techmap_jobs_lu_2023-07-01.jsonl.gz \
techmap_jobs_lu_2023-07-01.jsonl.gz
2. Downloading Data from AWS S3 (via Proprietary Contracts)
To list and download ordered files you can use the following commands. Please note that you only have access to the ordered time range and countries.
2.1 List all countries in our AWS S3 Bucket
aws s3 ls --profile YOUR_CREDENTIALS_FOR_TECHMAP \
--summarize s3://BUCKET_NAME
2.2 List all files in UK directory
aws s3 ls --profile YOUR_CREDENTIALS_FOR_TECHMAP \
--summarize s3://BUCKET_NAME/uk/
2.3 Download one file for the USA from May 4th 2023
aws s3 cp --profile YOUR_CREDENTIALS_FOR_TECHMAP \
s3://BUCKET_NAME/us/techmap_jobs_us_2023-05-04.jsonl.gz .
2.4 Download all files for the USA from April 2023
aws s3 sync --profile YOUR_CREDENTIALS_FOR_TECHMAP \
s3://BUCKET_NAME/us/ . \
--exclude "*" --include "techmap_jobs_us_2023-04-*"
Parsing Data Files
In JavaScript / TypeScript you can easily use JSON.parse() to parse all lines in the JSON files within a directory.
const fs = require('fs');
const path = require('path');
const readline = require('readline');
const directoryPath = './techmap/files/...';
fs.readdir(directoryPath, function (err, files) {
if (err) {
console.log('Unable to scan directory: ' + err);
return;
}
// Iterate through each file in the directory
files.forEach(function (file) {
// Check if file is a JSON file
if (path.extname(file) === '.json') {
// Read the file
const readStream = fs.createReadStream(directoryPath + file);
const rl = readline.createInterface({
input: readStream,
crlfDelay: Infinity
});
rl.on('line', function (line) {
try {
// Parse the JSON data in the line
const jsonData = JSON.parse(line);
// TODO: do something with the JSON - e.g., store in your own DB
console.log(jsonData);
} catch (err) {
console.log('Unable to parse JSON data in file ' + file + ' on line: ' + line + ': ' + err);
}
});
}
});
});
With the Java programming language you can use Java's jasonx library to parse all lines in the JSON files within a directory.
import java.io.IOException;
import java.nio.file.*;
import java.util.List;
import java.util.stream.Collectors;
import javax.json.*;
String directoryPath = "./techmap/files/...";
// Parse all JSON files in the directory
List<JsonObject> jsonObjects = Files.list(Path.of(directoryPath))
.filter(path -> path.toString().endsWith(".json"))
.map(path -> {
try (JsonReader reader = Json.createReader(Files.newBufferedReader(path))) {
return reader.readObject();
} catch (IOException e) {
throw new RuntimeException(e);
}
})
.collect(Collectors.toList());
// Do something with the list of JSON objects
for (JsonObject jsonObject : jsonObjects) {
// Process each JSON object
// TODO: do something with the JSON - e.g., store in your own DB
System.out.println(jsonString.toString());
}
For other Programming languages see the libraries section of the JSON Homepage. And remember that we have sample data you can test with.