Added new salary-related terms and improved image outputs in salary.ipynb
This commit is contained in:
@@ -0,0 +1,140 @@
|
||||
# Salary Analytics
|
||||
|
||||
A comprehensive salary analytics system that analyzes transaction data to identify salary earners, predict future salaries, and generate detailed reports.
|
||||
|
||||
## Features
|
||||
|
||||
- **Transaction Analysis**
|
||||
- Keyword-based salary transaction identification
|
||||
- Consistent amount transaction analysis
|
||||
- Transaction type analysis
|
||||
- Hypothesis overlap visualization
|
||||
|
||||
- **Salary Earner Classification**
|
||||
- Verified salary earners identification
|
||||
- Likely salary earners identification
|
||||
- High earner detection
|
||||
- Salary pattern analysis
|
||||
|
||||
- **Machine Learning**
|
||||
- Salary prediction models
|
||||
- Separate models for consistent and inconsistent earners
|
||||
- Feature engineering
|
||||
- Model evaluation metrics
|
||||
|
||||
- **Reporting**
|
||||
- CSV reports generation
|
||||
- Visualization plots
|
||||
- High earner details
|
||||
- Salary earner statistics
|
||||
|
||||
## Architecture
|
||||
|
||||
The project is organized into the following modules:
|
||||
|
||||
```
|
||||
salary_analytics/
|
||||
├── __init__.py
|
||||
├── config.py # Configuration settings
|
||||
├── data_loader.py # Database connection and data loading
|
||||
├── keyword_analyzer.py # Keyword-based analysis
|
||||
├── consistent_amount_analyzer.py # Consistent amount analysis
|
||||
├── transaction_type_analyzer.py # Transaction type analysis
|
||||
├── salary_earner_analyzer.py # Salary earner analysis
|
||||
├── salary_predictor.py # Machine learning models
|
||||
├── main.py # Main pipeline
|
||||
└── api.py # FastAPI endpoints
|
||||
```
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
The system can be configured through environment variables or the `config.py` file:
|
||||
|
||||
```python
|
||||
# Database Configuration
|
||||
DB_CONFIG = {
|
||||
"user": "db_user",
|
||||
"password": "your_secure_password",
|
||||
"name": "salary_db",
|
||||
"port": "5432",
|
||||
"host": "localhost"
|
||||
}
|
||||
|
||||
# Model Configuration
|
||||
MODEL_CONFIG = {
|
||||
"cv_threshold": 0.10,
|
||||
"min_transactions": 3,
|
||||
"threshold": 0.7,
|
||||
"high_earner_threshold": 10000
|
||||
}
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Using the API
|
||||
|
||||
1. Start the API server:
|
||||
```bash
|
||||
uvicorn salary_analytics.api:app --reload
|
||||
```
|
||||
|
||||
2. Access the API documentation:
|
||||
- Swagger UI: http://localhost:8000/docs
|
||||
- ReDoc: http://localhost:8000/redoc
|
||||
|
||||
### API Endpoints
|
||||
|
||||
1. **Basic Endpoints**
|
||||
- `GET /`: Welcome message
|
||||
- `GET /health`: Health check
|
||||
|
||||
2. **Analysis Endpoints**
|
||||
- `POST /analyze/keyword`: Run keyword analysis
|
||||
- `POST /analyze/consistent-amount`: Run consistent amount analysis
|
||||
- `POST /analyze/transaction-type`: Run transaction type analysis
|
||||
|
||||
3. **Report Generation**
|
||||
- `POST /generate/reports`: Generate all reports
|
||||
- `GET /download/{report_type}`: Download specific reports
|
||||
- Available types:
|
||||
- `high_earners`: High earner details
|
||||
- `likely_earners`: Likely salary earners
|
||||
- `final_table`: Final analysis table
|
||||
- `consistent_plot`: Consistent earners plot
|
||||
- `inconsistent_plot`: Inconsistent earners plot
|
||||
- `hypothesis_plot`: Hypothesis overlap plot
|
||||
|
||||
4. **Model Training**
|
||||
- `POST /train/models`: Train prediction models
|
||||
|
||||
5. **Pipeline**
|
||||
- `POST /run/pipeline`: Run complete pipeline
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
1. Build the Docker image:
|
||||
```bash
|
||||
docker-compose build
|
||||
```
|
||||
|
||||
2. Run the container:
|
||||
```bash
|
||||
docker-compose up
|
||||
```
|
||||
|
||||
The API will be available at http://localhost:8000
|
||||
|
||||
## Output Structure
|
||||
|
||||
```
|
||||
output/
|
||||
├── csv/
|
||||
│ ├── high_earner_details.csv
|
||||
│ ├── likely_salary_earner.csv
|
||||
│ └── final_table.csv
|
||||
└── plots/
|
||||
├── consistent_earners_predictions.png
|
||||
├── inconsistent_earners_predictions.png
|
||||
└── hypothesis_overlap.png
|
||||
```
|
||||
Reference in New Issue
Block a user