Salary Analytics

A comprehensive salary analytics system that analyzes transaction data to identify salary earners, predict future salaries, and generate detailed reports.

Features

  • Transaction Analysis

    • Keyword-based salary transaction identification
    • Consistent amount transaction analysis
    • Transaction type analysis
    • Hypothesis overlap visualization
  • Salary Earner Classification

    • Verified salary earners identification
    • Likely salary earners identification
    • High earner detection
    • Salary pattern analysis
  • Machine Learning

    • Salary prediction models
    • Separate models for consistent and inconsistent earners
    • Feature engineering
    • Model evaluation metrics
  • Reporting

    • CSV reports generation
    • Visualization plots
    • High earner details
    • Salary earner statistics

Architecture

The project is organized into the following modules:

salary_analytics/
├── __init__.py
├── config.py           # Configuration settings
├── data_loader.py      # Database connection and data loading
├── keyword_analyzer.py # Keyword-based analysis
├── consistent_amount_analyzer.py # Consistent amount analysis
├── transaction_type_analyzer.py  # Transaction type analysis
├── salary_earner_analyzer.py     # Salary earner analysis
├── salary_predictor.py # Machine learning models
├── main.py            # Main pipeline
└── api.py             # FastAPI endpoints

Configuration

The system can be configured through environment variables or the config.py file:

# Database Configuration
DB_CONFIG = {
    "user": "db_user",
    "password": "your_secure_password",
    "name": "salary_db",
    "port": "5432",
    "host": "localhost"
}

# Model Configuration
MODEL_CONFIG = {
    "cv_threshold": 0.10,
    "min_transactions": 3,
    "threshold": 0.7,
    "high_earner_threshold": 10000
}

Usage

Using the API

  1. Start the API server:
uvicorn salary_analytics.api:app --reload
  1. Access the API documentation:

API Endpoints

  1. Basic Endpoints

    • GET /: Welcome message
    • GET /health: Health check
  2. Analysis Endpoints

    • POST /analyze/keyword: Run keyword analysis
    • POST /analyze/consistent-amount: Run consistent amount analysis
    • POST /analyze/transaction-type: Run transaction type analysis
  3. Report Generation

    • POST /generate/reports: Generate all reports
    • GET /download/{report_type}: Download specific reports
      • Available types:
        • high_earners: High earner details
        • likely_earners: Likely salary earners
        • final_table: Final analysis table
        • consistent_plot: Consistent earners plot
        • inconsistent_plot: Inconsistent earners plot
        • hypothesis_plot: Hypothesis overlap plot
  4. Model Training

    • POST /train/models: Train prediction models
  5. Pipeline

    • POST /run/pipeline: Run complete pipeline

Docker Deployment

  1. Build the Docker image:
docker-compose build
  1. Run the container:
docker-compose up

The API will be available at http://localhost:8000

Output Structure

output/
├── csv/
│   ├── high_earner_details.csv
│   ├── likely_salary_earner.csv
│   └── final_table.csv
└── plots/
    ├── consistent_earners_predictions.png
    ├── inconsistent_earners_predictions.png
    └── hypothesis_overlap.png
S
Description
No description provided
Readme 3.7 MiB
Languages
HTML 53.1%
Jupyter Notebook 44.6%
Python 2.3%