Riak CS Configuration Reference

Note on Legacy Configuration Usage

If you choose to use the legacy app.config files for Riak CS and/or Stanchion, some parameters have changed names and must be updated.

In particular, for the Riak CS app.config:

  • cs_ip and cs_port have been combined into listener.
  • riak_ip and riak_pb_port have been combined into riak_host.
  • stanchion_ip and stanchion_port have been combined into stanchion_host.
  • admin_ip and admin_port have been combined into admin_listener.
  • webmachine_log_handler has become webmachine_access_log_handler.
  • {max_open_files, 50} has been depricated and should be replaced with {total_leveldb_mem_percent, 30}.

For the Stanchion app.config:

  • stanchion_ip and stanchion_port have been combined into listener.
  • riak_ip and riak_port have been combined into riak_host.

Each of the above pairs follows a similar form. For example, if your legacy app.config configuration was previously:

  {riak_cs, [
      {cs_ip, "127.0.0.1"},
      {cs_port, 8080 },
      . . .
  ]},

It should now read:

  {riak_cs, [
      {listener, {"127.0.0.1", 8080}},
      . . .
  ]},

and so on. More details can be found at configuring Riak CS.

This document is intended as a reference listing of all configurable parameters for Riak CS. For a more narrative-style walkthrough of configuring Riak CS, we recommend consulting the configuring Riak CS tutorial.

The configuration for Riak CS is handled through either the riak-cs.conf and advanced.config file pair, which were introduced in Riak CS 2.0.0, or the two old-style app.config and vm.args files. All configuration files will be located in each Riak CS node’s /etc directory. Please note that you may only use one of these pairs at a time, as the app.config/vm.args pair will take priority over the new-style configuration files.

If you are using it, the vm.args file will house settings related to the Erlang VM on which both Riak and Riak CS run. These settings have been folded into the riak-cs.conf and riak.conf configuration files in newer systems.

The app.config and advanced.config files share an identical format, and can control all of Riak CS’s behaviors. The files are divided into the following sections:

  • riak_cs — Most settings are housed in this section of the file
  • webmachine — Settings related to Webmachine, the HTTP server framework that Riak CS uses for HTTP connections
  • lager — Settings for lager, the Erlang logging framework used by Riak CS
  • sasl — There is only one setting in this section, sasl_error_lager, which determines whether and how Riak CS uses Erlang’s SASL error logger

Most of the settings you will need to manipulate have been ported into the newer riak-cs.conf configuration format, but there may be some advanced settings – such as setting up customized lager streams – that will need to be configured in advanced.config.

A Note About Time Values

In the app.config configuration files, time periods were generally written as either seconds or milliseconds, with no real indication of which was being used. With the update to riak-cs.conf, all values that describe a period of time are written as an integer and a character, describing the unit of time and the number of times that unit should be repeated for the period. For example 31d represents 31 days, 6h represents six hours, 6000ms represents 6,000 milliseconds.

The full list of valid time units are as follows:

f – Fortnights w – Weeks d – Days h – Hours m – Minutes s – Seconds ms – Milliseconds

The tables below will show settings for both riak-cs.conf and advanced.config/app.config where applicable, organized by functionality.

Connection Information

riak-cs.conf

ConfigDescriptionDefault
listener The IP address/port for the Riak CS node 127.0.0.1:8080
riak_host The IP address/port for the Riak CS node's corresponding Riak node (used by Riak's Protocol Buffers interface) 127.0.0.1:8087
root_host The root host name accepted by Riak CS. Changing this setting to, for example, my_cs_host would enable users to make requests to a URL such as http://bucket.my_cs_host/object/ (or to the corresponding HTTP host). s3.amazonaws.com

advanced.config/app.config

ConfigDescriptionDefault
listener The IP address for the Riak CS node {"127.0.0.1", 8080}
riak_host The TCP IP/port for the Riak CS node's corresponding Riak node (used by Riak's Protocol Buffers interface) {"127.0.0.1", 8087}
cs_root_host The root host name accepted by Riak CS. Changing this setting to, for example, my_cs_host would enable users to make requests to a URL such as http://bucket.my_cs_host/object/ (or to the corresponding HTTP host). s3.amazonaws.com

Connection Pools

Riak CS enables you to establish connection pools for normal requests (such as GET and PUT) as well as for bucket listing requests.

riak-cs.conf

ConfigDescriptionDefault
pool.request.size Fixed-Size settings for the general request pool for Riak CS. Please note that we recommend setting Riak's protobuf.backlog setting to be higher than pool.request.size's fixed size, i.e. higher than 128. The default for protobuf.backlog is 128. 128
pool.request.overflow Overflow-size settings for the general request pool for Riak CS. 0
pool.list.size Fixed-Size settings for the bucket listing request pool for Riak CS. 5
pool.list.overflow Overflow-size settings for the bucket listing request pool for Riak CS. 0

advanced.config/app.config

In these files, each pool is specified as a nested tuple of the following form:

{riak_cs, [
           {Name, {FixedSize, OverflowSize}}
          ]}
ConfigDescriptionDefault
request_pool Settings for the general request pool for Riak CS. Please note that we recommend setting Riak's pb_backlog setting higher than request_pool's fixed size, i.e. higher than 128. The default for pb_backlog is 128. {128, 0}
bucket_list_pool Settings for the bucket listing request pool for Riak CS {5, 0}

Stanchion

riak-cs.conf

ConfigDescriptionDefault
stanchion_host The IP address/port for the Stanchion node in the cluster. Please note that there should be only one Stanchion node in the cluster. 127.0.0.1:8085
stanchion_ssl Whether SSL is enabled for connections between the Riak CS node and Stanchion off

advanced.config/app.config

ConfigDescriptionDefault
stanchion_host The IP address/port for the Stanchion node in the cluster. Please note that there should be only one Stanchion node in the cluster. {"127.0.0.1",8085}
stanchion_ssl Whether SSL is enabled for connections between the Riak CS node and Stanchion false

Admin and Authentication Settings

riak-cs.conf

ConfigDescriptionDefault
admin.listener You have the option to provide a special endpoint for performing system administration tasks in Riak CS. This setting sets the IP address and port for that endpoint. If you leave this setting commented out, then administrative tasks use the IP and port as all other Riak CS traffic. 127.0.0.1:8000
admin.key The admin key used for administrative access to Riak CS, e.g. usage of the /riak-cs/stats endpoint. Please note that both admin.key and admin.secret must match the corresponding settings in the Stanchion node's stanchion.conf. admin-key
admin.secret The admin secret used for administrative access to Riak CS. See the description for admin.key above for more information. admin-secret
anonymous_user_creation You will need to set this parameter to on to allow for the creation of an admin user when setting up a new Riak CS cluster. We recommend, however, that you enable anonymous user creation only temporarily, unless your use case specifically dictates that anonymous users should be able to create accounts. off
auth_module The module used by Riak CS for authentication. We do not recommend changing this setting unless you implement a custom authentication scheme. riak_cs_s3_auth
rewrite_module A rewrite module contains a set of rules for translating requests made using a particular API to requests in the the native Riak CS storage API. We do not recommend changing this setting unless you implement a custom module. riak_cs_s3_rewrite

advanced.config/app.config

ConfigDescriptionDefault
admin_listener You have the option to provide a special endpoint for performing system administration tasks in Riak CS. This setting sets the IP address and port for that endpoint. If you leave this setting commented out, then administrative tasks use the IP and port as all other Riak CS traffic. {"127.0.0.1",8000}
admin_key The admin key used for administrative access to Riak CS, e.g. usage of the /riak-cs/stats endpoint. Please note that both admin_key and admin_secret must match the corresponding settings in the Stanchion node's app.config.
admin_secret The admin secret used for administrative access to Riak CS. See the description for admin_key above for more information.
anonymous_user_creation You will need to set this parameter to true to allow for the creation of an admin user when setting up a new Riak CS cluster. We recommend, however, that you enable anonymous user creation only temporarily, unless your use case specifically dictates that anonymous users should be able to create accounts. false
auth_module The module used by Riak CS for authentication. We do not recommend changing this setting unless you implement a custom authentication scheme. riak_cs_s3_auth
max_buckets_per_user The number of buckets that can be created by each user. If a user exceeds the bucket creation limit, they are still able to perform other actions, including bucket deletion. 100
rewrite_module A rewrite module contains a set of rules for translating requests made using a particular API to requests in the the native Riak CS storage API. We do not recommend changing this setting unless you implement a custom module. riak_cs_s3_rewrite

Usage Recording

These settings relate to Riak CS’s access logs.

riak-cs.conf

ConfigDescriptionDefault
stats.access.archive_period How large each access archive object is. This setting should be a multiple of stats.access.flush_factor. Expressed as a time-value. 1h
stats.access.archiver.max_backlog The number of access logs that are allowed to accumulate in the archiver's queue before it begins skipping to catch up. Expressed as an integer number of logs. 2
stats.access.flush_factor How often the access log should be flushed, as a factor of access_archive_period, where 1 means once per period, 2 means twice per period, etc. 1
access_log_flush_size The additional access log flush trigger. After this many accesses have been recorded, the log will be flushed, even if the flush interval has not expired. Expressed as an integer number of accesses. 1000000
riak_cs.usage_request_limit How many archive periods a user can request in one usage read, applied independently to access/usage and billing/storage. Expressed as a time-value 31d
stats.storage.schedule.$time When to automatically start storage calculation batches. Expressed as an HHMM UTC time. For example, 0600 would calculate at 6 am UTC every day. If you would like to schedule multiple batches, changing $time for each entry. For example stats.storage.schedule.2 = 1800 could be the second entry, scheduled for 6:00pm UTC. 0600
stats.storage.archive_period The size of each storage archive object. Should be chosen such that each stats.storage.schedule-based calculation falls in a different period. Expressed as a time-value. 1h

advanced.config/app.config

ConfigDescriptionDefault
access_archive_period How large each access archive object is. This setting should be a multiple of access_log_flush_factor. Expressed as an integer number of seconds (e.g. 3600 translates to 1 hour). 3600
access_archive_max_backlog The number of access logs that are allowed to accumulate in the archiver's queue before it begins skipping to catch up. Expressed as an integer number of logs. 2
access_log_flush_factor How often the access log should be flushed, as a factor of access_archive_period, where 1 means once per period, 2 means twice per period, etc. 1
access_log_flush_size The additional access log flush trigger. After this many accesses have been recorded, the log will be flushed, even if the flush interval has not expired. Expressed as an integer number of accesses. 1000000
usage_request_limit How many archive periods a user can request in one usage read, applied independently to access/usage and billing/storage. Expressed as an integer number of intervals. The default of 744 thus translates to one month at one-hour intervals. of 744 744
storage_schedule When to automatically start storage calculation batches. Expressed as a list of HHMM UTC times. For example, ["0600"] would calculate at 6 am UTC every day, ["0600", "1945"] would calculate at 6 am and 7:45 pm UTC every day, and so on. []
storage_archive_period The size of each storage archive object. Should be chosen such that each storage_schedule-based calculation falls in a different period. Expressed as an integer number of seconds. The default of 86400 translates to 1 day. 86400

Garbage Collection

Settings related to Riak CS’s garbage collection /(GC) process.

riak-cs.conf

ConfigDescriptionDefault
gc.interval How often the GC daemon waits between GC batch operations. Expressed as a time-value. 15m
gc.max_workers The maximum number of worker processes that may be started by the GC daemon to use for concurrent reaping of GC-eligible objects. 2
gc.retry_interval How long a move to the GC to-do list can remain failed before it is re-attempted. Expressed as a time-value. 6h
gc.leeway_period How long to retain the block for an object after it has been deleted. This leeway period is set to give the delete indication enough time to propagate to all replicas. Expressed as a time-value. 24h

advanced.config/app.config

ConfigDescriptionDefault
epoch_start The time that the GC daemon uses to begin collecting keys from the GC eligibility bucket. Records in this bucket use keys based the epoch time the record is created plus leeway_seconds. The default is 0 and should be sufficient for general use. A case for readjusting this value is if the secondary index query run by the GC daemon continually times out. Raising the starting value can decrease the range of the query and make it more likely that the query will succeed. The value must be specified in Erlang binary format, e.g. set it to `<<10>>` to specify 10. 0
gc_batch_size This option is used only when gc_paginated_indexes is set to true. It represents the size used for paginating the results of the secondary index query. 1000
gc_interval How often the GC daemon waits between GC batch operations. Expressed as an integer number of seconds. 900 (15 minutes)
gc_max_workers The maximum number of worker processes that may be started by the GC daemon to use for concurrent reaping of GC-eligible objects. 5
gc_paginated_indexes If you're running Riak nodes that are of a version prior to 1.4.0, set this to false. Otherwise, you will not need to adjust this setting. true
gc_retry_interval How long a move to the GC to-do list can remain failed before it is re-attempted. Expressed as an integer number of seconds. 21600 (6 hours)
leeway_seconds The number of seconds to retain the block for an object after it has been deleted. This leeway time is set to give the delete indication time to propagate to all replicas. Expressed as an integer number of seconds. 86400 (24 hours)
max_scheduled_delete_manifests The maximum number of manifests (representative of object versions) that can be in the scheduled_delete state for a given key. A value of unlimited means that there is no maximum and that pruning will not be based on count. An example of where this option is useful is a use case involving a lot of churn on a fixed set of keys in a time frame that is relatively short compared to the leeway_seconds value. This can result in the manifest objects reaching a size that can negatively impact system performance. unlimited

Concurrency and Buffering

advanced.config/app.config Only

There are two parameters related to concurrency and buffering that you should consider adding to your Riak CS settings if you are having issues with PUT requests. Raising the value of both of these settings may provide higher single- client throughput.

ConfigDescriptionDefault
put_buffer_factor The number of blocks that will be buffered in-memory in Riak CS before it begins to slow down reading from the HTTP client. 1
put_concurrency The number of threads inside of Riak CS that are used to write blocks to Riak. 1

Miscellaneous Settings

riak-cs.conf

ConfigDescriptionDefault
cs_version The Riak CS version number. This number is used to selectively enable new features for the current version to better support rolling upgrades. New installs shouldn't need to modify this. If you're performing a rolling upgrade, keep the original value (if not defined, Riak CS uses 0) of the old app.config until all nodes have been upgraded. At that point, set it to the new value. 10300
dtrace If your Erlang VM supports DTrace or SystemTap, set this parameter to on. off
trust_x_forwarded_for If your load balancer adds an X-Forwarded-For header and is reliable, i.e. the load balancer is able to guarantee that it is not added by a malicious user, set this option to on. Otherwise, Riak CS takes the source IP address as an input (which is the default). off

advanced.config/app.config

ConfigDescriptionDefault
cs_version The Riak CS version number. This number is used to selectively enable new features for the current version to better support rolling upgrades. New installs shouldn't need to modify this. If you're performing a rolling upgrade, keep the original value (if not defined, Riak CS uses 0) of the old app.config until all nodes have been upgraded. At that point, set to the new value.
dtrace_support If your Erlang VM supports DTrace or SystemTap, set this parameter to true. false
fold_objects_for_list_keys If your Riak CS cluster is running Riak nodes prior to version 1.4.0, set this parameter to false. Otherwise, you will not need to modify it.This setting has been deprecated and will be removed in the next major version. true
n_val_1_get_requests If set to true, Riak CS will use a special request option when retrieving the blocks of an object. This special option instructs Riak to only send a request for the object block to a single eligible virtual node (vnode) instead of to all eligible vnodes. This differs from the standard r request option provided by Riak in that r affects how many vnode responses to wait for before returning and has no effect on how many vnodes are actually contacted. Enabling this option (the default) has the effect of greatly reducing the intra-cluster bandwidth used by Riak when retrieving objects with Riak CS. This option is harmless when used with a version of Riak prior to 1.4.0, but the option to disable is provided as a safety measure. This setting has been deprecated and will be removed in the next major version. true
trust_x_forwarded_for If your load balancer adds an X-Forwarded-For header and is reliable, i.e. the load balancer is able to guarantee that it is not added by a malicious user, set this option to true. Otherwise, Riak CS takes the source IP address as an input (which is the default). false

Timeouts on each Riak call

As Riak CS stores all data in underlying Riak, Riak CS processes communicate to Riak over an API using protocol buffers. This is a typical remote call - depending on system requirements, the timeout could be configured to avoid unnecessary timeouts.

In Riak 1.5.3 or later, configurations under riakc section are unavailable. Timeouts are configurable depending on each access case. This enables fine grained tuning or ad-hoc reaction in production environment issues. These items are only configurable in riak_cs section of advanced.config. All units in the chart below are milliseconds.

ConfigDescriptionDefault
ping_timeoutA timeout value used in ping API5000
get_user_timeoutA timeout value on retrieving user information for authentication, authentication60000
get_bucket_timeoutA timeout value on retrieving bucket information, for ACL or policy information60000
get_manifest_timeoutA timeout value on retrieving manifest of a key60000
get_block_timeoutA timeout value on retrieving a chunk of an object60000
local_block_timeoutA timeout value on retrieving a local chunk of an object5000
proxy_get_block_timeoutA timeout value of proxy get request to remote cluster (EE only)60000
get_access_timeoutA timeout value of retrieving a timeslot information of access statistics60000
get_gckey_timeoutA timeout value of retrieving a key in GC bucket60000
put_manifest_timeoutA timeout value on putting a new manifest60000
put_block_timeoutA timeout value on putting a chunk of a object60000
put_access_timeoutA timeout value of putting an entry into access statistics60000
put_gckey_timeoutA timeout value of putting an entry into GC bucket60000
put_user_usage_timeoutA timeout value on storing a result of storage calculation of each user60000
delete_manifest_timeoutA timeout value on deleting a manifest in garbage collection60000
delete_block_timeoutA timeout value on deleting a chunk of an object in garbage collection60000
delete_gckey_timeoutA timeout value on deleting an entry in GC bucket60000
list_keys_list_objects_timeoutA timeout value on listing objects of a bucket, older version (will be removed in 2.x)60000
list_keys_list_users_timeoutA timeout value on listing users60000
storage_calc_timeoutA timeout value on running storage calculation on a bucket60000
list_objects_timeoutA timeout value on listing objects of a bucket, older version (will be removed in 2.x)60000
fold_objects_timeoutA timeout value on listing objects of a bucket (default since 1.5.0)60000
get_index_range_gckeys_timeoutA timeout value on listing keys in garbage collection bucket, overall call60000
get_index_range_gckeys_call_timeoutA timeout value on listing keys in garbage collection bucket, each continuation call60000
get_index_list_multipart_uploads_timeoutA timeout value on listing incomplete multipart upload of an object60000

Webmachine

advanced.config/app.config Only

Settings specific to Webmachine, the web server that handles all HTTP and HTTPS connections to Riak CS. The riak_cs_access_log_handler and webmachine_log_handler settings are part of a log_handlers sub-grouping:

{webmachine, [
              %% Other configs
              {log_handlers, [
                              {webmachine_access_log_handler, ...},
                              {riak_cs_access_log_handler, ...},
                              ]},
              %% Other configs
             ]}
ConfigDescriptionDefault
server_name
webmachine_log_handler If this setting is commented out or removed, access to Webmachine log handling will be disabled. ["./log"]
riak_cs_access_log_handler We do not recommend changing or removing this setting. []

Logging

advanced.config/app.config Only

These settings relate to lager, the Erlang logging framework used by Riak CS. They are included in the lager settings in app.config.

The lager_console_backend and lager_file_backend settings are part of a handlers sub-group:

{lager, [
         %% Other configs
         {handlers, [
                     {lager_console_backend, ...},
                     {lager_file_backend, ...}
                    ]},
         %% Other configs
        ]}
ConfigDescriptionDefault
lager_console_backend See the lager documentation for more details.
lager_file_backend See the lager documentation for more details.
ConfigDescriptionDefault
crash_log Whether to write to a crash log and where. If commented out, omitted, or undefined, no crash logging will take place. ./log/crash.log
crash_log_count The number of crash logs to keep. Setting this parameter to 0 (the default) means that only the current log will be kept. 0
crash_log_date When to rotate the crash log. The default is no time rotation. For documentation on the syntax of this parameter, see here. $D0
crash_log_msg_size The maximum size of events in the crash log, expressed as a number of bytes. 65536
crash_log_size The maximum size of the crash log, in bytes, before it is rotated. Setting this parameter to 0 disables rotation. 10485760
error_logger_redirect Whether to redirect error_logger messages into lager. true

SASL

advanced.config/app.config Only

ConfigDescriptionDefault
sasl_error_lager Whether to enable , Erlang's built-in error logger. false