Files
AnalysisTesting/README.md
T
salakojoshua1234_gmail.com 5767f55686 Update project structure and enhance model persistence
- Added new model and scaler files to .gitignore and output directory.
- Updated Dockerfile to create output/models directory.
- Revised README to include instructions for using a .env file for configuration.
- Enhanced config.py to load database credentials from environment variables.
- Implemented model saving functionality in salary_predictor.py for consistent and inconsistent earners.
2025-05-02 00:16:46 +01:00

169 lines
4.7 KiB
Markdown

# Salary Analytics
A comprehensive salary analytics system that analyzes transaction data to identify salary earners, predict future salaries, and generate detailed reports.
## Features
- **Transaction Analysis**
- Keyword-based salary transaction identification
- Consistent amount transaction analysis
- Transaction type analysis
- Hypothesis overlap visualization
- **Salary Earner Classification**
- Verified salary earners identification
- Likely salary earners identification
- High earner detection
- Salary pattern analysis
- **Machine Learning**
- Salary prediction models
- Separate models for consistent and inconsistent earners
- Feature engineering
- Model evaluation metrics
- Model persistence (saved in output/models)
- **Reporting**
- CSV reports generation
- Visualization plots
- High earner details
- Salary earner statistics
## Architecture
The project is organized into the following modules:
```
salary_analytics/
├── __init__.py
├── config.py # Configuration settings
├── data_loader.py # Database connection and data loading
├── keyword_analyzer.py # Keyword-based analysis
├── consistent_amount_analyzer.py # Consistent amount analysis
├── transaction_type_analyzer.py # Transaction type analysis
├── salary_earner_analyzer.py # Salary earner analysis
├── salary_predictor.py # Machine learning models
├── main.py # Main pipeline
└── api.py # FastAPI endpoints
```
## Configuration
The system can be configured through environment variables using a `.env` file:
1. Copy the example environment file:
```bash
cp .env.example .env
```
2. Edit the `.env` file with your database credentials:
```bash
DB_USER=your_username
DB_PASSWORD=your_password
DB_NAME=your_database
DB_PORT=your_port
DB_HOST=your_host
```
## Usage
### Using the API
1. Start the API server:
```bash
uvicorn salary_analytics.api:app --reload
```
2. Access the API documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
### API Endpoints
1. **Basic Endpoints**
- `GET /`: Welcome message
- `GET /health`: Health check
2. **Data Loading**
- `POST /load-data`: Load transaction data
- Parameters:
- `source`: Data source ('db' or 'csv')
- `file`: CSV file (required if source is 'csv')
- Example:
```bash
# Load from database
curl -X POST "http://localhost:8000/load-data?source=db"
# Load from CSV
curl -X POST "http://localhost:8000/load-data?source=csv" -F "file=@path/to/your/file.csv"
```
3. **Analysis Endpoints**
- `POST /analyze/keyword`: Run keyword analysis
- `POST /analyze/consistent-amount`: Run consistent amount analysis
- `POST /analyze/transaction-type`: Run transaction type analysis
4. **Report Generation**
- `POST /generate/reports`: Generate all reports
- `GET /download/{report_type}`: Download specific reports
- Available types:
- `high_earners`: High earner details
- `likely_earners`: Likely salary earners
- `final_table`: Final analysis table
- `consistent_plot`: Consistent earners plot
- `inconsistent_plot`: Inconsistent earners plot
- `hypothesis_plot`: Hypothesis overlap plot
5. **Model Training**
- `POST /train/models`: Train prediction models
6. **Pipeline**
- `POST /run/pipeline`: Run complete pipeline
### Workflow
1. Start the API server
2. Load data using the `/load-data` endpoint
3. Run any of the analysis endpoints
4. Generate and download reports as needed
Note: All analysis endpoints require data to be loaded first. If you try to run any analysis without loading data, you'll receive a 400 error with a message to load data first.
## Docker Deployment
1. Build the Docker image:
```bash
docker-compose build
```
2. Run the container with environment variables:
```bash
docker run -v $(pwd)/output:/app/output \
-e DB_USER=your_username \
-e DB_PASSWORD=your_password \
-e DB_NAME=your_database \
-e DB_PORT=your_port \
-e DB_HOST=your_host \
salary-analytics
```
The API will be available at http://localhost:8000
## Output Structure
```
output/
├── csv/
│ ├── high_earner_details.csv
│ ├── likely_salary_earner.csv
│ └── final_table.csv
├── plots/
│ ├── consistent_earners_predictions.png
│ ├── inconsistent_earners_predictions.png
│ └── hypothesis_overlap.png
└── models/
├── consistent_model.joblib
├── inconsistent_model.joblib
├── consistent_scaler.joblib
└── inconsistent_scaler.joblib
```