Developer Resources

Developer Resources

Downloading Data from AWS S3

To list and download ordered files you can use the following commands. Please note that you only have access to the ordered time range and countries.

List all countries in our AWS S3 Bucket

aws s3 ls --profile YOUR_CREDENTIALS_FOR_TECHMAP \
  --summarize s3://BUCKET_NAME

List all files in UK directory

aws s3 ls --profile YOUR_CREDENTIALS_FOR_TECHMAP \
  --summarize s3://BUCKET_NAME/uk/

Download one file for the USA from May 4th 2023

aws s3 cp --profile YOUR_CREDENTIALS_FOR_TECHMAP \
  s3://BUCKET_NAME/us/techmap_jobs_us_2023-05-04.jsonl.gz .

Download all files for the USA from April 2023

aws s3 sync --profile YOUR_CREDENTIALS_FOR_TECHMAP \
  s3://BUCKET_NAME/us/ . \
  --exclude "*" --include "techmap_jobs_us_2023-04-*"

Parsing Data Files

In JavaScript / TypeScript you can easily use JSON.parse() to parse all lines in the JSON files within a directory.

const fs = require('fs');
const path = require('path');
const readline = require('readline');

const directoryPath = './techmap/files/...';

fs.readdir(directoryPath, function (err, files) {
  if (err) {
    console.log('Unable to scan directory: ' + err);
    return;
  }
  // Iterate through each file in the directory
  files.forEach(function (file) {
    // Check if file is a JSON file
    if (path.extname(file) === '.json') {
      // Read the file
      const readStream = fs.createReadStream(directoryPath + file);
      const rl = readline.createInterface({
        input: readStream,
        crlfDelay: Infinity
      });
      rl.on('line', function (line) {
        try {
          // Parse the JSON data in the line
          const jsonData = JSON.parse(line);
          // TODO: do something with the JSON - e.g., store in your own DB
          console.log(jsonData);
        } catch (err) {
          console.log('Unable to parse JSON data in file ' + file + ' on line: ' + line + ': ' + err);
        }
      });
    }
  });
});

With the Java programming language you can use Java's jasonx library to parse all lines in the JSON files within a directory.

import java.io.IOException;
import java.nio.file.*;
import java.util.List;
import java.util.stream.Collectors;
import javax.json.*;

String directoryPath = "./techmap/files/...";

// Parse all JSON files in the directory
List<JsonObject> jsonObjects = Files.list(Path.of(directoryPath))
  .filter(path -> path.toString().endsWith(".json"))
  .map(path -> {
    try (JsonReader reader = Json.createReader(Files.newBufferedReader(path))) {
      return reader.readObject();
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  })
  .collect(Collectors.toList());

// Do something with the list of JSON objects
for (JsonObject jsonObject : jsonObjects) {
    // Process each JSON object
    // TODO: do something with the JSON - e.g., store in your own DB
    System.out.println(jsonString.toString());
}

For other Programming languages see the libraries section of the JSON Homepage. And remember that we have sample data you can test with.

Need Help? If you have any additional questions about working with our Job Datasets or Data Streams, please don't hesitate to contact us. We're here to help!