# OSArchiver: OpenStack databases archiver

OSArchiver is a python package that aims to archive and remove soft deleted data from OpenStack databases.
The package is shiped with a main script called osarchiver that reads a configuration file and run the archivers.

# Philosophy

* OSArchiver doesn't have any knowledge of Openstack business objects
* OSArchiver purely relies on the common way of how OpenStack marks data as deleted by setting the column 'deleted_at' to a datetime.
It means that a row is archivable/removable if the 'deleted_at' column is not NULL

# Limitations

* Support Mysql/MariaDB as db backend.
* python >= 3.5

# Design

OSArchiver reads an INI configuration file in which you can define:

* archivers: a section that hold one source and a non mandatory list of destinations
* sources: a section that define a source of where the data should be read (basically the OS DB)
* destinations: a section that define where the data should be archived

# How does it works:

                                       .----------.
            .--------------------------| Archiver |-----------------------------.
            |                          '----------'                             |
            |                                                                   |
            |                                                                   |
            |                                                                   |
            v                        _______________                            v
       .--------.                    \              \                    .-------------.
       | Source |-------------------->) ARCHIVE DATA )------------------>| Desinations |
       '--------'                    /______________/                    '-------------'
            |                                |                                  |
            |                                |                                  |
            |                                |                                  |
            |                                |                                  |
            |                                |                                  |
            |                                v                                  |
            |                  .--------------------------.                     |
            v                 ( No error and delete_data=1 )                    |
                               '--------------------------'                     |
        _.-----._                            |                    _.-----._     |
      .-         -.                          |                  .-         -.   |   ___   
      |-_       _-|                          |                  |-_       _-|   |  |   |\ 
      |  ~-----~  |                          |                  |  ~-----~  |<--'->|   ' ___   
      |           |                          |                  |           |      | SQL|   |\ 
      `._       _.'                          |                  `._       _.'      |____|   '-|---.
         "-----"                             |                     "-----"              | CSV |   |
      OpenStack DB                           v                  Archiving DB            |_____|   |
            ^                        _______________                                              v
            |                        \              \                                 .-----------------------.
            '-------------------------) DELETE DATA  )                               ( remote_store configured )
                                     /______________/                                 '-----------------------'
                                                                                                  |
                                                                                                  v
                                                                                             __________ 
                                                                                            [_|||||||_°]
                                                                                            [_|||||||_°]
                                                                                            [_|||||||_°]

                                                                                     Remote Storage (Swift, ...)

# Installation

```
git clone https://github.com/ovh/osarchiver.git
cd osarchiver
pip install -r requirements.txt
pip setup.py install
```

# osarchiver script

```
# osarchiver --help
usage: osarchiver [-h] --config CONFIG [--log-file LOG_FILE]
                  [--log-level {info,warn,error,debug}] [--debug] [--dry-run]

optional arguments:
  -h, --help            show this help message and exit
  --config CONFIG       Configuration file to read
  --log-file LOG_FILE   Append log to the specified file
  --log-level {info,warn,error,debug}
                        Set log level
  --debug               Enable debug mode
  --dry-run             Display what would be done without really deleting or
                        writing data
```

# Configuration
The configuation is an INI file containing several sections. You configure your
differents archivers in this configuration file. An example is available at the
root of the repository.

## DEFAULT section:
* Drescription: default section that define default/fallback value for options
* Format **[DEFAULT]**
* configuration parameters: all the parameters of archiver, source, destination
  and backend section can be added in this section, those will be the fallback
  value if the value is not set in a section.

## Archiver section:

* Description: defines where to read data and where to archive them and/or delete.
* Format **[archiver:*name*]**
* configuration parameters:
    * **src**: name of the src section
    * **dst**: comma separated list of destination section names
    * **enable**: 1 or 0, if set to 0 the archiver is ignored and not run

Example:
```properties
[archiver:My_Archiver]
src: os_prod
dst: file, db

[src:os_prod]
...

[dst:file]
...

[dst:db]
....
```

## Source section:

* Description: defines where the OpenStack database are. It supports for now
  one backend (db) but it may be easily extended
* Format **[src:*name*]**
* configuration parameters:
    * **backend**: the name of backend to use, only `db` is supported
    * **retention**: 12 MONTH
    * **archive_data**: 0 or 1 if set to 1 expect a dest to archive the data else
      won't run the archiving step just the delete step.
    * **delete_data**: 0 or 1 if set to 1 will run the delete step. If the
      archive step fails the delete step is not run to prevent loose of data.
    * *backend specific options*


## Destination section:

* Description: defines where the data should be written. It supports for now
  two backends (db for datatabase and file [csv, sql]) and may be extended
* Format **[dst:*name*]**
* configuration parameters:
    * **backend**: the name of backend to use, `db` or `file`
    * *backend specific options*


## Backends options:

### db
* Description: is the database (mysql/mariadb) backend
* options:
    * **host**: DB host to connect to
    * **port**: port of MariaDB server is running on
    * **user**: login of MariaDB server to connect with
    * **password**: password of user
    * **delete_limit**: apply a LIMIT to DELETE statement
    * **select_limit**: apply a LIMIT to SELECT statement
    * **bulk_insert**: data are inserted in DB every builk_insert rows
    * **deleted_column**: name of column that holds the date of soft delete, is
      also used to filter table to archive, it means that the table must have
      the deleted_column to be archived
    * **where**: the literal SQL where applied to the select statement
    Ex: where=${deleted_column} <= SUBDATE(NOW(), INTERVAL ${retention})
    * **foreign_key_check**: true or false if set to false disable foreign key
      check (default true)
    * **retention**: how long time of data to keep in database (SQL format: 12
      MONTH, 1 DAY, etc..)
    * **excluded_databases**: comma, cariage return or semicolon separated
      regexp of DB to exclude when specfiying '*' as database. The following DB
      are akways ignored:  'mysql', 'performance_schema', 'information_schema'
    * **excluded_tables**: comma, cariage return or semicolon separated regexp
      of DB to exclude when specifying '*' as table. Ex: shadow_.*,.*_archived
    * **db_suffix**: a non mendatory suffix to apply to the archiving DB. The
    default suffix '_archive' is applied if you archive on same host than
    source without setting a db_suffix or table_suffix (avoid reading and
    writing on the same db.table)
    * **table_suffix**: apply a suffix to the archiving table if specified

### file
* Description: is the file archiving destination type, it writes SQL data in a
  file using one or several formats (supported: SQL, CSV)
    * **directory**: the directory path where to archive data. You may use the
      {date} keyword to append automaticaly the date to the directory path.
      (/backup/archive_{date})
    * **formats**: a comma, semicolon or cariage return separated list that
      define the format in witch archive the data (csv, sql)

You've developed a new cool feature ? Fixed an annoying bug ? We'd be happy

to hear from you !

Have a look in [CONTRIBUTING.md](https://github.com/ovh/osarchiver/blob/master/CONTRIBUTING.md)

# Related links

 * Contribute: https://github.com/ovh/osarchiver/blob/master/CONTRIBUTING.md
 * Report bugs: https://github.com/ovh/osarchiver/issues

# License

See https://github.com/ovh/osarchiver/blob/master/LICENSE