AnalysisTesting/README.md

# Salary Analytics

A comprehensive salary analytics system that analyzes transaction data to identify salary earners, predict future salaries, and generate detailed reports.

## Features

- **Transaction Analysis**
  - Keyword-based salary transaction identification
  - Consistent amount transaction analysis
  - Transaction type analysis
  - Hypothesis overlap visualization

- **Salary Earner Classification**
  - Verified salary earners identification
  - Likely salary earners identification
  - High earner detection
  - Salary pattern analysis

- **Machine Learning**
  - Salary prediction models
  - Separate models for consistent and inconsistent earners
  - Feature engineering
  - Model evaluation metrics

- **Reporting**
  - CSV reports generation
  - Visualization plots
  - High earner details
  - Salary earner statistics

## Architecture

The project is organized into the following modules:

```
salary_analytics/
├── __init__.py
├── config.py           # Configuration settings
├── data_loader.py      # Database connection and data loading
├── keyword_analyzer.py # Keyword-based analysis
├── consistent_amount_analyzer.py # Consistent amount analysis
├── transaction_type_analyzer.py  # Transaction type analysis
├── salary_earner_analyzer.py     # Salary earner analysis
├── salary_predictor.py # Machine learning models
├── main.py            # Main pipeline
└── api.py             # FastAPI endpoints
```


## Configuration

The system can be configured through environment variables or the `config.py` file:

```python
# Database Configuration
DB_CONFIG = {
    "user": "db_user",
    "password": "your_secure_password",
    "name": "salary_db",
    "port": "5432",
    "host": "localhost"
}

# Model Configuration
MODEL_CONFIG = {
    "cv_threshold": 0.10,
    "min_transactions": 3,
    "threshold": 0.7,
    "high_earner_threshold": 10000
}
```

## Usage

### Using the API

1. Start the API server:
```bash
uvicorn salary_analytics.api:app --reload
```

2. Access the API documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc

### API Endpoints

1. **Basic Endpoints**
   - `GET /`: Welcome message
   - `GET /health`: Health check

2. **Analysis Endpoints**
   - `POST /analyze/keyword`: Run keyword analysis
   - `POST /analyze/consistent-amount`: Run consistent amount analysis
   - `POST /analyze/transaction-type`: Run transaction type analysis

3. **Report Generation**
   - `POST /generate/reports`: Generate all reports
   - `GET /download/{report_type}`: Download specific reports
     - Available types:
       - `high_earners`: High earner details
       - `likely_earners`: Likely salary earners
       - `final_table`: Final analysis table
       - `consistent_plot`: Consistent earners plot
       - `inconsistent_plot`: Inconsistent earners plot
       - `hypothesis_plot`: Hypothesis overlap plot

4. **Model Training**
   - `POST /train/models`: Train prediction models

5. **Pipeline**
   - `POST /run/pipeline`: Run complete pipeline

## Docker Deployment

1. Build the Docker image:
```bash
docker-compose build
```

2. Run the container:
```bash
docker-compose up
```

The API will be available at http://localhost:8000

## Output Structure

```
output/
├── csv/
│   ├── high_earner_details.csv
│   ├── likely_salary_earner.csv
│   └── final_table.csv
└── plots/
    ├── consistent_earners_predictions.png
    ├── inconsistent_earners_predictions.png
    └── hypothesis_overlap.png
```