FedScope Employment Cube

Understanding the 140+ million federal employee records (1998-2024)

⚠️ This is unofficial processed data. For official information, visit fedscope.opm.gov

Quick Navigation

Getting Started

This dataset contains 140+ million records of federal civilian employees from 1998 to 2024. Each record represents one employee in a specific quarter with their job details, demographics, and compensation.

📊 Get the Data: You can either: 🚀 Quick Start: Run examples.py for comprehensive usage examples!

What's Included

Useful For

Privacy: All data is anonymized - no individual employees can be identified.

Understanding the Data Fields

Each employee record has 52 fields that fall into three categories:

Field Format Structure: The dataset contains three types of fields: Tip: Use the description fields (ending in 't') for most analysis - they're much easier to understand!

Field Categories

Selected Key Fields

Start with these key fields for most analyses:

Field What It Shows Example Values
year, quarter When the data was collected 2024, "September"
agysubt Which agency/department "Department of Defense", "Internal Revenue Service"
loct Where they work "CALIFORNIA", "DISTRICT OF COLUMBIA"
occt Their job title "Accountant", "Computer Scientist", "Human Resources Specialist"
patcot Job category "Professional", "Administrative", "Technical"
salary Annual salary (dollars) 65000, 95000, null (when private)
sallvlt Salary range "$60,000 - $69,999", "$90,000 - $99,999"
agelvlt Age group "25-29 YEARS", "45-49 YEARS"
edlvlt Education level "Bachelor's Degree", "Master's Degree"
employment Person count (always 1) 1
Important Data Handling:

Code Examples

🚀 Complete Examples: Run examples.py for comprehensive usage examples with both local and download methods!

Basic Usage Patterns

Load Data

# Load from GitHub (without cloning)
df = pd.read_parquet('https://github.com/abigailhaddad/fedscope_employment/raw/main/fedscope_data/parquet/fedscope_employment_September_2024.parquet')

# Load locally (if you've cloned the repo)
df = pd.read_parquet('fedscope_data/parquet/fedscope_employment_September_2024.parquet')

Count Employees by Agency

# Employment is stored as strings, convert properly
agency_counts = df.groupby('agysubt')['employment'].apply(
    lambda x: sum(int(i) for i in x)
).sort_values(ascending=False).head(10)

Analyze Salaries by Education

# Convert salary to numeric, handling edge cases
df['salary_numeric'] = df['salary'].apply(lambda x: int(float(x)) if x not in [None, 'nan', '*****', ''] and pd.notna(x) else None)
df_with_salary = df[df['salary_numeric'].notna()]
salary_by_edu = df_with_salary.groupby('edlvlt')['salary_numeric'].mean().sort_values(ascending=False)

Track Workforce Over Time

# Group by time periods
quarterly = df.groupby(['year', 'quarter'])['employment'].apply(
    lambda x: sum(int(i) for i in x)
)
💡 Key Points:

Using DuckDB for Multi-Year Analysis

Query multiple years efficiently using DuckDB:

import duckdb

# Create a view from multiple Parquet files
con = duckdb.connect('fedscope.duckdb')
con.execute("""
    CREATE VIEW employment AS 
    SELECT * FROM read_parquet('fedscope_employment_September_2024.parquet')
    UNION ALL
    SELECT * FROM read_parquet('fedscope_employment_September_2023.parquet')
""")

# Query across years
result = con.execute("""
    SELECT year, agysubt, SUM(CAST(employment AS INTEGER)) as employees
    FROM employment
    GROUP BY year, agysubt
    ORDER BY year, employees DESC
""").fetchdf()
📊 DuckDB Benefits:

Complete Field Reference

Field Format Structure: The dataset contains three types of fields: Naming Pattern: Description fields follow the pattern of adding 't' to the code field name (e.g., agelvlagelvlt).
For Analysis: Use the description fields (ending in 't') which provide human-readable values. Code fields are primarily for data processing and joins.

Core Identifiers

Field Name Description Type
dataset_key Unique identifier for each quarterly dataset String
year Calendar year of the snapshot Integer
quarter Quarter of the snapshot (March, June, September, December) String

Demographics

Code Field Description Field Description Example Values
agelvl agelvlt Age level (5-year bands) A: < 20, B: 20-24, C: 25-29, etc.
edlvl edlvlt Education level 00-22 (High school through Doctorate)
los N/A Length of service (data field) Years of federal service

Job Characteristics

Code Field Description Field Description Details
occ occt Occupation code and series 4-digit OPM occupation codes
patco patcot PATCO category Professional, Administrative, Technical, Clerical, Other
pp ppt Pay plan GS, ES, SL, ST, etc. (pp added in 2017, ppt added in 2018)
ppgrd ppgrdt Pay plan and grade Combined pay plan and grade level
gsegrd gsegrdt GS equivalent grade 01-15, SES, or blank
supervis supervist Supervisory status 2: Supervisor, 4: Manager, 6: Leader, 8: Non-supervisor

Compensation & Work Details

Code Field Description Field Description Notes
salary N/A Annual salary (data field) Adjusted basic pay (null when redacted with asterisks)
sallvl sallvlt Salary level Salary ranges in $10K bands
wrksch wrkscht Work schedule F: Full-time, P: Part-time, etc.
toa toat Type of appointment Permanent, temporary, term, etc.
wkstat wkstatt Work status Active, leave without pay, etc.

Organization & Location

Code Field Description Field Description Format
agy N/A Agency code (from lookup) 4-character agency identifier
agysub agysubt Sub-agency code 4-character sub-agency identifier
loc loct Location code State abbreviation or country code
stemocc stemocct STEM occupation indicator 1: STEM, 0: Non-STEM

Coding Schemes

Age Levels (agelvl)

Age is reported in 5-year bands to protect privacy:

Education Levels (edlvl)

Education codes range from 00 to 22:

PATCO Categories

Major Pay Plans

Data Notes & Limitations

What You Should Know

Missing Salaries: Some salary information is marked as private (shows as null). This affects about 8% of records.
Agency Name Changes: Government agencies get reorganized and renamed over 26 years. The same agency might appear with different names in different time periods.
Field Changes Over Time: Pay plan information (pp added in 2017, ppt added in 2018), so earlier years don't include these fields.

Data Coverage by Year

What's NOT Included

Employment Snapshots: This shows who was employed at specific points in time. If you need data on new hires or separations specifically, those are available in separate FedScope datasets.

Resources & Links

Get Started

All files are hosted directly in this GitHub repository: 72 quarterly Parquet files in fedscope_data/parquet/ and original ZIP files in fedscope_data/raw/ for easy download and replication.

Official Government Sources

Need Help?

Citation

If you use this dataset, please cite:

Haddad, A. (2024). FedScope Employment Cube Dataset [Data set]. GitHub. https://github.com/abigailhaddad/fedscope_employment

This is UNOFFICIAL documentation created as part of an independent data processing project

For official information, visit fedscope.opm.gov

Last updated: 2024