Introduction
As the semester comes to an end, it’s the perfect time to apply the skills you’ve learned in data science to a real-world problem. This tutorial will guide you through analyzing the job market using Python, an API, and data analysis techniques.
By the end of this project, you’ll have a deeper understanding of salary trends, job demand, and how to automate data collection and analysis. This project is designed for data science students who want to explore the job market while honing their Python and data analysis skills.
Why Analyze the Job Market?
Understanding the job market is crucial for making informed career decisions. By analyzing job postings, you can identify trends such as in-demand skills, salary ranges, and geographic hotspots for specific roles. This knowledge can help you tailor your resume, decide on locations, negotiate salaries, and choose career paths with strong growth potential. Additionally, this project will give you hands-on experience with APIs, data cleaning, and statistical analysis - skills that are highly valuable in the data science field.
In this project, we’ll focus on analyzing salary data for software developer roles in the United States for the last month. We’ll use the Daily International Job Postings API to fetch job postings, filter them based on specific criteria, and perform statistical analysis on the salary data.
Project Setup
Before diving into the analysis, ensure your environment is properly configured. You’ll need Python installed, along with libraries like pandas
, requests
, and python-dotenv
for handling data and API requests. Create a .env
file to securely store your API key, and set up a project directory to organize your code and data files. This setup will streamline the process of fetching, analyzing, and visualizing job market data.
Prerequisites
To complete this project, you’ll need a few tools and some foundational knowledge to ensure a smooth experience. This project is designed to be beginner-friendly, but having the following prerequisites will help you get the most out of it:
- Basic knowledge of Python: Familiarity with variables, functions, loops, and libraries like Pandas will help you follow along with the code and customize it for your needs.
- Python installed on your computer: Download and install Python from python.org. Ensure you have Python 3.6 or later, as some libraries used in this project may not be compatible with older versions.
- A text editor or IDE: Use tools like VS Code, PyCharm, or Jupyter Notebook to write and run your Python code. These environments provide helpful features like syntax highlighting, debugging, and code completion.
- A RapidAPI account: Sign up for a free account at RapidAPI to access the Job Postings API. This platform allows you to manage your API subscriptions and keys.
- An API key for the Job Postings API: After signing up for RapidAPI, subscribe to the Daily International Job Postings API to obtain your API key. This key will authenticate your requests and allow you to fetch job posting data.
Once you have these tools and resources ready, you’ll be well-prepared to start fetching and analyzing job market data. If you’re new to APIs or Python, don’t worry - this tutorial will guide you step by step!
Installing Required Libraries
To successfully fetch, process, and analyze job market data, we’ll rely on several Python libraries. These libraries simplify tasks like making API requests, handling data, and managing sensitive information. Here’s what we need each library for:
requests
: This library is used to send HTTP requests to the Daily International Job Postings API. It allows us to retrieve job posting data in JSON format, which we can then process and analyze.pandas
: A powerful library for data manipulation and analysis. Withpandas
, we can clean, filter, and transform the raw job data into a structured format, making it easier to perform calculations and generate insights.python-dotenv
: This library helps manage environment variables, such as your API key, securely. By storing sensitive information in a.env
file, we avoid hardcoding it into the script, which improves security and makes the code more maintainable.
To install these libraries, run the following command in your terminal or command prompt:
pip install requests pandas python-dotenv
Once installed, you’ll be ready to start fetching and analyzing job market data with ease!
Creating a .env
File
To keep your API key secure, store it in a .env
file. Create a file named .env
in your project directory and add the following line:
RAPIDAPI_KEY=your_api_key_here
Replace your_api_key_here
with your actual RapidAPI key.
Fetching Job Data
Now that our environment is set up, let’s dive into writing Python code to fetch and analyze job postings from the API. This process is broken down into four key steps, each building on the previous one to ensure a seamless workflow:
Step 1: Define the API Endpoint and Parameters
Before fetching data, we need to configure the API request by defining the endpoint and query parameters. This step involves specifying details like the country, occupation, and date range for the job postings. By setting these parameters, we ensure that the API returns only the most relevant data for our analysis.
import os
import requests
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# API configuration
API_URL = "https://daily-international-job-postings.p.rapidapi.com/api/v2/jobs/search"
API_KEY = os.getenv("RAPIDAPI_KEY")
MAX_REQUESTS = 2 # Cap the number of requests as we only have 25 free requests
# Create last month string in the format "yyyy-MM"
last_month = (datetime.today().replace(day=1) - timedelta(days=1)).strftime("%Y-%m")
# Define query parameters
params = {
"countryCode": "us", # Jobs in the USA
"occupation": "programmer", # Software developer roles
"dateCreated": last_month, # Jobs from last month
"hasSalary": "true" # Only jobs with salary information
}
Step 2: Fetch Job Postings
Once the API request is configured, we’ll write a function to fetch the job postings. This step involves making HTTP requests to the API, handling pagination to retrieve all available results, and storing the data in a structured format. Pagination is crucial because APIs often limit the number of results returned per request, so we need to loop through multiple pages to collect all the data.
def fetch_job_postings(base_params):
all_jobs = []
page = 1
total_count = float("inf") # Initialize to infinity to enter the loop
while len(all_jobs) < total_count and page <= max_pages:
params = {**base_params, "page": page}
headers = {
"X-RapidAPI-Key": API_KEY,
"X-RapidAPI-Host": "daily-international-job-postings.p.rapidapi.com"
}
try:
response = requests.get(API_URL, headers=headers, params=params)
response.raise_for_status() # Raise an error for bad responses
data = response.json()
total_count = data.get("totalCount", 0)
if not data.get("result"):
break
all_jobs.extend(data["result"]) # Accumulate jobs
page += 1
except requests.exceptions.RequestException as error:
print(f"Error fetching jobs for page {page}: {error}")
break
print(f"Fetched {len(all_jobs)} jobs of {total_count} in total from {page - 1} pages.")
return all_jobs
Step 3: Format and Save the Data
After fetching the raw data, we’ll clean and format it for easier analysis. This step includes organizing the data into a tabular format (e.g., a Pandas DataFrame) and saving it to a file (e.g., a TSV or CSV file). Proper formatting ensures that the data is ready for statistical analysis and visualization.
import pandas as pd
def format_job_postings(job_postings):
data = []
for job in job_postings:
row = {
"Occupation": job.get("occupation", ""),
"Min Salary": job.get("minSalary", 0),
"Currency": job.get("jsonLD", {}).get("salaryCurrency", ""),
"Location": f"{job.get('city', '')}",
"Industry": f"{job.get('industry', '')}",
"Work Place": f"{job.get('workPlace', '')}",
"Title": job.get("title", ""),
"URL": job.get("jsonLD", {}).get("url", "")
}
data.append(row)
df = pd.DataFrame(data)
return df
# Save the data to a TSV file
def save_to_tsv(df, filename="salaries_analysis.tsv"):
df.to_csv(filename, sep="\t", index=False)
print(f"Salary data successfully written to {filename}")
Step 4: Calculate Key Metrics
Once the data is in a usable format, we’ll analyze the data to calculate key metrics such as average salary, median salary, and salary distributions for industries or locations. This step provides actionable insights into the job market, helping us understand trends like salary ranges, outliers, and variations by location or job title.
def analyze_salaries(df):
# Filter out unrealistic salaries
df_filtered = df[(df["Min Salary"] > 20000) & (df["Min Salary"] < 500000)]
# Average Salary
mean_salary = df_filtered["Min Salary"].mean()
print(f"Mean yearly min. salary: {mean_salary:,.0f}")
# Mode of Salary
mode_salary = df_filtered["Min Salary"].mode()
print(f"Mode yearly min. salary: {mode_salary.iloc[0]:,.0f}" if not mode_salary.empty else "Mode yearly min. salary: N/A")
# Median of Salary
median_salary = df_filtered["Min Salary"].median()
print(f"Median yearly min. salary: {median_salary:,.0f}")
# Highest Salary
highest_salary = df_filtered["Min Salary"].max()
print(f"Highest yearly min. salary: {highest_salary:,.0f}")
# Lowest Salary
lowest_salary = df_filtered["Min Salary"].min()
print(f"Lowest yearly min. salary: {lowest_salary:,.0f}")
# Count outliers
outliers = df[df["Min Salary"] > 500000].shape[0]
print(f"Outliers above 500,000: {outliers}")
# Salary by Location
location_salaries = df_filtered.groupby("Location")["Min Salary"].mean().astype(int).sort_values(ascending=False)
location_salaries = location_salaries.astype(int).apply(lambda x: f"{x:,}")
print(f"\nMean Salary by Location: {location_salaries}")
# Salary by Industry
industry_salaries = df_filtered.groupby("Industry")["Min Salary"].mean().astype(int).sort_values(ascending=False)
industry_salaries = industry_salaries.astype(int).apply(lambda x: f"{x:,}")
print(f"\nMean Salary by Industry:\n{industry_salaries}")
# Salary by Work Place
workplace_salaries = df_filtered.groupby("Work Place")["Min Salary"].mean().astype(int).sort_values(ascending=False)
workplace_salaries = workplace_salaries.astype(int).apply(lambda x: f"{x:,}")
print(f"\nMean Salary by Work Place:\n{workplace_salaries}")
Run the Analysis
Finally, let’s put everything together in a main
function.
def main():
# Fetch job postings
jobs = fetch_job_postings(params)
# Format and save the data
df = format_job_postings(jobs)
save_to_tsv(df)
# Analyze salaries
analyze_salaries(df)
if __name__ == "__main__":
main()
Summary
In this tutorial, we explored how to retrieve job postings using an API, process and structure the data with Python and Pandas, and conduct a salary analysis to uncover key trends. By working with real-world job data, you’ve gained practical experience in data extraction, cleaning, and statistical evaluation - essential skills for any data analyst or aspiring data scientist.
But this is just the beginning. There are countless ways to expand on this project. You could visualize salary distributions, analyze salary variations across industries and locations, or even build predictive models with machine learning to estimate salaries based on job descriptions. With each step, you’ll refine your ability to derive meaningful insights from data, a crucial skill in today’s data-driven world.
Keep experimenting, keep analyzing, and most importantly - enjoy the process of discovering patterns hidden within the job market. Happy coding!