5767f5568625b6fd079cb1e94cd0958155e2bf60
- Added new model and scaler files to .gitignore and output directory. - Updated Dockerfile to create output/models directory. - Revised README to include instructions for using a .env file for configuration. - Enhanced config.py to load database credentials from environment variables. - Implemented model saving functionality in salary_predictor.py for consistent and inconsistent earners.
Salary Analytics
A comprehensive salary analytics system that analyzes transaction data to identify salary earners, predict future salaries, and generate detailed reports.
Features
-
Transaction Analysis
- Keyword-based salary transaction identification
- Consistent amount transaction analysis
- Transaction type analysis
- Hypothesis overlap visualization
-
Salary Earner Classification
- Verified salary earners identification
- Likely salary earners identification
- High earner detection
- Salary pattern analysis
-
Machine Learning
- Salary prediction models
- Separate models for consistent and inconsistent earners
- Feature engineering
- Model evaluation metrics
- Model persistence (saved in output/models)
-
Reporting
- CSV reports generation
- Visualization plots
- High earner details
- Salary earner statistics
Architecture
The project is organized into the following modules:
salary_analytics/
├── __init__.py
├── config.py # Configuration settings
├── data_loader.py # Database connection and data loading
├── keyword_analyzer.py # Keyword-based analysis
├── consistent_amount_analyzer.py # Consistent amount analysis
├── transaction_type_analyzer.py # Transaction type analysis
├── salary_earner_analyzer.py # Salary earner analysis
├── salary_predictor.py # Machine learning models
├── main.py # Main pipeline
└── api.py # FastAPI endpoints
Configuration
The system can be configured through environment variables using a .env file:
- Copy the example environment file:
cp .env.example .env
- Edit the
.envfile with your database credentials:
DB_USER=your_username
DB_PASSWORD=your_password
DB_NAME=your_database
DB_PORT=your_port
DB_HOST=your_host
Usage
Using the API
- Start the API server:
uvicorn salary_analytics.api:app --reload
- Access the API documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
API Endpoints
-
Basic Endpoints
GET /: Welcome messageGET /health: Health check
-
Data Loading
POST /load-data: Load transaction data- Parameters:
source: Data source ('db' or 'csv')file: CSV file (required if source is 'csv')
- Example:
# Load from database curl -X POST "http://localhost:8000/load-data?source=db" # Load from CSV curl -X POST "http://localhost:8000/load-data?source=csv" -F "file=@path/to/your/file.csv"
- Parameters:
-
Analysis Endpoints
POST /analyze/keyword: Run keyword analysisPOST /analyze/consistent-amount: Run consistent amount analysisPOST /analyze/transaction-type: Run transaction type analysis
-
Report Generation
POST /generate/reports: Generate all reportsGET /download/{report_type}: Download specific reports- Available types:
high_earners: High earner detailslikely_earners: Likely salary earnersfinal_table: Final analysis tableconsistent_plot: Consistent earners plotinconsistent_plot: Inconsistent earners plothypothesis_plot: Hypothesis overlap plot
- Available types:
-
Model Training
POST /train/models: Train prediction models
-
Pipeline
POST /run/pipeline: Run complete pipeline
Workflow
- Start the API server
- Load data using the
/load-dataendpoint - Run any of the analysis endpoints
- Generate and download reports as needed
Note: All analysis endpoints require data to be loaded first. If you try to run any analysis without loading data, you'll receive a 400 error with a message to load data first.
Docker Deployment
- Build the Docker image:
docker-compose build
- Run the container with environment variables:
docker run -v $(pwd)/output:/app/output \
-e DB_USER=your_username \
-e DB_PASSWORD=your_password \
-e DB_NAME=your_database \
-e DB_PORT=your_port \
-e DB_HOST=your_host \
salary-analytics
The API will be available at http://localhost:8000
Output Structure
output/
├── csv/
│ ├── high_earner_details.csv
│ ├── likely_salary_earner.csv
│ └── final_table.csv
├── plots/
│ ├── consistent_earners_predictions.png
│ ├── inconsistent_earners_predictions.png
│ └── hypothesis_overlap.png
└── models/
├── consistent_model.joblib
├── inconsistent_model.joblib
├── consistent_scaler.joblib
└── inconsistent_scaler.joblib
Description
Languages
HTML
53.1%
Jupyter Notebook
44.6%
Python
2.3%