T

salakojoshua1234_gmail.com 99e1b82ea8 Add autonomous salary detection feature to API

Integrated SalaryDetect class into the API and initiated an autonomous salary detection loop during the startup event. This enhancement improves the system's capability to monitor and analyze salary data in real-time.

2025-07-05 19:27:53 +01:00

demo

Enhance XLS upload functionality and update requirements. Added Flask, Flask-SQLAlchemy, and Alembic to requirements. Modified database schema in upload_xls.py for improved data handling and added SQLAlchemy configuration in config.py.

2025-06-09 15:34:18 +01:00

migrations

2025-06-09 15:34:18 +01:00

salary_analytics

Add autonomous salary detection feature to API

2025-07-05 19:27:53 +01:00

.dockerignore

Update configuration and ignore files; added openpyxl to requirements

2025-06-09 12:45:54 +01:00

.env.example

Doker fix

2025-05-17 03:55:36 -04:00

.gitignore

Update configuration and ignore files; added openpyxl to requirements

2025-06-09 12:45:54 +01:00

docker-compose.yml

Doker fix

2025-05-17 03:55:36 -04:00

Dockerfile

first commit

2025-05-17 03:52:41 -04:00

PROJECT.md

Add autonomous salary detection feature to API

2025-07-05 19:27:53 +01:00

README.md

first commit

2025-05-17 03:52:41 -04:00

requirements.txt

2025-06-09 15:34:18 +01:00

run.py

2025-06-09 15:34:18 +01:00

README.md

Salary Analytics

A comprehensive salary analytics system that analyzes transaction data to identify salary earners, predict future salaries, and generate detailed reports.

Features

Transaction Analysis
- Keyword-based salary transaction identification
- Consistent amount transaction analysis
- Transaction type analysis
- Hypothesis overlap visualization
Salary Earner Classification
- Verified salary earners identification
- Likely salary earners identification
- High earner detection
- Salary pattern analysis
Machine Learning
- Salary prediction models
- Separate models for consistent and inconsistent earners
- Feature engineering
- Model evaluation metrics
- Model persistence (saved in output/models)
Reporting
- CSV reports generation
- Visualization plots
- High earner details
- Salary earner statistics

Architecture

The project is organized into the following modules:

salary_analytics/
├── __init__.py
├── config.py           # Configuration settings
├── data_loader.py      # Database connection and data loading
├── keyword_analyzer.py # Keyword-based analysis
├── consistent_amount_analyzer.py # Consistent amount analysis
├── transaction_type_analyzer.py  # Transaction type analysis
├── salary_earner_analyzer.py     # Salary earner analysis
├── salary_predictor.py # Machine learning models
├── main.py            # Main pipeline
└── api.py             # FastAPI endpoints

Configuration

The system can be configured through environment variables using a .env file:

Copy the example environment file:

cp .env.example .env

Edit the .env file with your database credentials:

DB_USER=your_username
DB_PASSWORD=your_password
DB_NAME=your_database
DB_PORT=your_port
DB_HOST=your_host

Usage

Using the API

Start the API server:

uvicorn salary_analytics.api:app --reload

Access the API documentation:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

API Endpoints

Basic Endpoints
- GET /: Welcome message
- GET /health: Health check

Data Loading

POST /load-data: Load transaction data

Parameters:
- source: Data source ('db' or 'csv')
- file: CSV file (required if source is 'csv')

Example:

# Load from database
curl -X POST "http://localhost:8000/load-data?source=db"

# Load from CSV
curl -X POST "http://localhost:8000/load-data?source=csv" -F "file=@path/to/your/file.csv"

Analysis Endpoints
- POST /analyze/keyword: Run keyword analysis
- POST /analyze/consistent-amount: Run consistent amount analysis
- POST /analyze/transaction-type: Run transaction type analysis
Report Generation
- POST /generate/reports: Generate all reports
- GET /download/{report_type}: Download specific reports
  - Available types:
    - high_earners: High earner details
    - likely_earners: Likely salary earners
    - final_table: Final analysis table
    - consistent_plot: Consistent earners plot
    - inconsistent_plot: Inconsistent earners plot
    - hypothesis_plot: Hypothesis overlap plot
Model Training
- POST /train/models: Train prediction models

Pipeline

POST /run/pipeline: Run complete pipeline

POST /run/streaming-pipeline: Run pipeline in batches

Parameters:
- source: Data source ('db' or 'csv')
- file: CSV file (required if source is 'csv')
- batch_size: Number of rows to process in each batch (default: 10000)

Example:

# Run streaming pipeline from database
curl -X POST "http://localhost:8000/run/streaming-pipeline?source=db&batch_size=5000"

# Run streaming pipeline from CSV
curl -X POST "http://localhost:8000/run/streaming-pipeline?source=csv&batch_size=5000" -F "file=@path/to/your/file.csv"

Response:

[
  {
    "batch_number": 1,
    "total_batches": 10,
    "processed_rows": 5000,
    "results_path": "/app/output/csv/batch_results_20240315_123456/batch_1_results.csv",
    "message": "Successfully processed batch 1 of 10"
  },
  // ... more batch responses ...
]

Workflow

Start the API server
Load data using the /load-data endpoint
Run any of the analysis endpoints
Generate and download reports as needed

For large datasets, use the streaming pipeline endpoint:

Start the API server
Run the streaming pipeline with appropriate batch size
Monitor batch processing progress
Access results in the batch results directory

Note: All analysis endpoints require data to be loaded first. If you try to run any analysis without loading data, you'll receive a 400 error with a message to load data first.

Docker Deployment

Build the Docker image:

docker-compose build

Run the container with environment variables:

docker run -v $(pwd)/output:/app/output \
           -e DB_USER=your_username \
           -e DB_PASSWORD=your_password \
           -e DB_NAME=your_database \
           -e DB_PORT=your_port \
           -e DB_HOST=your_host \
           salary-analytics

The API will be available at http://localhost:8000

Output Structure

output/
├── csv/
│   ├── high_earner_details.csv
│   ├── likely_salary_earner.csv
│   └── final_table.csv
├── plots/
│   ├── consistent_earners_predictions.png
│   ├── inconsistent_earners_predictions.png
│   └── hypothesis_overlap.png
└── models/
    ├── consistent_model.joblib
    ├── inconsistent_model.joblib
    ├── consistent_scaler.joblib
    └── inconsistent_scaler.joblib