Update project structure and enhance model persistence

- Added new model and scaler files to .gitignore and output directory.
- Updated Dockerfile to create output/models directory.
- Revised README to include instructions for using a .env file for configuration.
- Enhanced config.py to load database credentials from environment variables.
- Implemented model saving functionality in salary_predictor.py for consistent and inconsistent earners.
This commit is contained in:
2025-05-02 00:16:46 +01:00
parent 8acfb436f3
commit 5767f55686
8 changed files with 82 additions and 43 deletions
+30 -23
View File
@@ -21,6 +21,7 @@ A comprehensive salary analytics system that analyzes transaction data to identi
- Separate models for consistent and inconsistent earners
- Feature engineering
- Model evaluation metrics
- Model persistence (saved in output/models)
- **Reporting**
- CSV reports generation
@@ -48,25 +49,20 @@ salary_analytics/
## Configuration
The system can be configured through environment variables or the `config.py` file:
The system can be configured through environment variables using a `.env` file:
```python
# Database Configuration
DB_CONFIG = {
"user": "db_user",
"password": "your_secure_password",
"name": "salary_db",
"port": "5432",
"host": "localhost"
}
1. Copy the example environment file:
```bash
cp .env.example .env
```
# Model Configuration
MODEL_CONFIG = {
"cv_threshold": 0.10,
"min_transactions": 3,
"threshold": 0.7,
"high_earner_threshold": 10000
}
2. Edit the `.env` file with your database credentials:
```bash
DB_USER=your_username
DB_PASSWORD=your_password
DB_NAME=your_database
DB_PORT=your_port
DB_HOST=your_host
```
## Usage
@@ -140,9 +136,15 @@ Note: All analysis endpoints require data to be loaded first. If you try to run
docker-compose build
```
2. Run the container:
2. Run the container with environment variables:
```bash
docker-compose up
docker run -v $(pwd)/output:/app/output \
-e DB_USER=your_username \
-e DB_PASSWORD=your_password \
-e DB_NAME=your_database \
-e DB_PORT=your_port \
-e DB_HOST=your_host \
salary-analytics
```
The API will be available at http://localhost:8000
@@ -155,8 +157,13 @@ output/
│ ├── high_earner_details.csv
│ ├── likely_salary_earner.csv
│ └── final_table.csv
── plots/
├── consistent_earners_predictions.png
├── inconsistent_earners_predictions.png
└── hypothesis_overlap.png
── plots/
├── consistent_earners_predictions.png
├── inconsistent_earners_predictions.png
└── hypothesis_overlap.png
└── models/
├── consistent_model.joblib
├── inconsistent_model.joblib
├── consistent_scaler.joblib
└── inconsistent_scaler.joblib
```