Developing with Riak KV
Data Types:GSets

GSets are a bucket-level Riak data type that can be used by themselves or associated with a bucket/key pair. They do not yet have the ability to be used within a map like regular sets.

GSets are collections of unique binary values (such as strings). All of the values in a gset are unique and are automatically sorted alphabetically irresepective of the order they were added.

For example, if you attempt to add the element shovel to a set that already contains shovel, the operation will be ignored by Riak KV.

Unlike sets, elements can only be added and no element modification or deletion is possible.

Known Issue

Unlike other data types, gsets require other data to be present in the cluster before they can be created. If you are unable to create a gset on a new cluster, please try creating a set first and then retrying with your gset. Please see issue #950 for details.

Set Up a Bucket Type

If you’ve already created and activated a bucket type with gset as the datatype parameter, skip to the next section.

Start by creating a bucket type with the datatype parameter gset:

riak-admin bucket-type create gsets '{"props":{"datatype":"gset"}}'

Note

The gsets bucket type name provided above is an example and is not required to be gsets. You are free to name bucket types whatever you like, with the exception of default.

After creating a bucket with a Riak data type, confirm the bucket property configuration associated with that type is correct:

riak-admin bucket-type status gsets

This returns a list of bucket properties and their values in the form of property: value.

If our gsets bucket type has been set properly we should see the following pair in our console output:

datatype: gset

Once we have confirmed the bucket type is properly configured, we can activate the bucket type to be used in Riak KV:

riak-admin bucket-type activate gsets

We can check if activation has been successful by using the same bucket-type status command shown above:

riak-admin bucket-type status gsets

After creating and activating our new gsets bucket type, we can setup our client to start using the bucket type as detailed in the next section.

Client Setup

Using sets involves creating a bucket/key pair to house a gset and running gset-specific operations on that pair.

Here is the general syntax for creating a bucket type/bucket/key combination to handle a gset:

// In the Java client, a bucket/bucket type combination is specified
// using a Namespace object. To specify bucket, bucket type, and key,
// use a Location object that incorporates the Namespace object, as is
// done below.

Location set =
  new Location(new Namespace("<bucket_type>", "<bucket>"), "<key>");
# Note: both the Riak Ruby Client and Ruby the language have a class
# called Set. Make sure that you refer to the Ruby version as ::Set and
# the Riak client version as Riak::Crdt::Set

bucket = client.bucket_type('bucket_type_name').bucket('bucket_name')
set = Riak::Crdt::GrowOnlySet.new(bucket, key)
$location = new \Basho\Riak\Location('key', new \Basho\Riak\Bucket('bucket_name', 'bucket_type'));
gset = bucket.new('2019-11-17')

# or

from riak.datatypes import GSet
gset = GSet('account-12345678', '2019-11-17')
// https://github.com/basho/riak-dotnet-client/blob/develop/src/RiakClientExamples/Dev/Using/DataTypes.cs

// As with counters, with the Riak .NET Client you interact with gsets
// by building an Options object or using a Builder
var builder = new FetchGSet.Builder()
    .WithBucketType("gsets")
    .WithBucket("account-12345678")
    .WithKey("2019-11-17");

// NB: builder.Options will only be set after Build() is called.
FetchGSet fetchGSetCommand = builder.Build();

FetchGSetOptions options = new FetchGSetOptions("gsets", "account-12345678", "2019-11-17");

// These two options objects are equal
Assert.AreEqual(options, builder.Options);
// As with counters, with the Riak Node.js Client you interact with gsets on the
// basis of the gset's location in Riak, as specified by an options object.
// Below is an example:
var options = {
    bucketType: 'gsets',
    bucket: 'account-12345678',
    key: '2019-11-17'
};
%% Like counters, sets are not encapsulated in a
%% bucket/key in the Erlang client. See below for more
%% information.
curl http://localhost:8098/types/<bucket_type>/buckets/<bucket>/datatypes/<key>

# Note that this differs from the URL structure for non-data type requests,
# which end in /keys/<key>

Create a GSet

For the following example, we will use a set to store a list of transactions that occur for an account number on a specific date. Let’s create a Riak gset stored in the key cities in the bucket travel using the gsets bucket type created previously:

// In the Java client, you specify the location of Data Types
// before you perform operations on them:

Location citiesSet =
  new Location(new Namespace("gsets", "travel"), "cities");
travel = client.bucket_type('sets').bucket('travel')
cities_set = Riak::Crdt::GrowOnlySet.new(travel, 'cities')

# Alternatively, the Ruby client enables you to set a bucket type as
# being globally associated with a Riak data type. The following would
# set all set buckets to use the sets bucket type:

Riak::Crdt::DEFAULT_BUCKET_TYPES[:set] = 'sets'

# This would enable us to create our set without specifying a bucket
# type:
travel = client.bucket('travel')
cities_set = Riak::Crdt::GrowOnlySet.new(travel, 'cities')
$location = new \Basho\Riak\Location('2019-11-17', 'account-12345678', 'gsets');
bucket = client.bucket_type('gsets').bucket('account-12345678')

# The client detects the bucket type's data type and automatically
# returns the right data type for you, in this case a Riak set.
gset = bucket.new('2019-11-17')

# You can also create a reference to a set explicitly:
from riak.datatypes import GSet

gset = GSet('account-12345678', '2019-11-17')
// https://github.com/basho/riak-dotnet-client/blob/develop/src/RiakClientExamples/Dev/Using/DataTypes.cs

// Now we'll create a Builder object for the gset with which we want to
// interact:
var builder = new FetchGSet.Builder()
    .WithBucketType("gsets")
    .WithBucket("account-12345678")
    .WithKey("2019-11-17");
// Now we'll create a options object for the gset with which we want to
// interact:
var options = {
    bucketType: 'gsets',
    bucket: 'account-12345678',
    key: '2019-11-17'
};
20191177Gset = riakc_gset:new().

%% GSets in the Erlang client are opaque data structures that
%% collect operations as you mutate them. We will associate the data
%% structure with a bucket type, bucket, and key later on.
# You cannot create an empty gset through the HTTP interface. GSets can
# only be created when an element is added to them, as in the examples
# below.

Upon creation, our set is empty. We can verify that it is empty at any time:

// Using our "cities" Location from above:

FetchSet fetch = new FetchSet.Builder(citiesSet)
    .build();
FetchSet.Response response = client.execute(fetch);
RiakSet set = response.getDatatype();
boolean isEmpty = set.viewAsSet().isEmpty();
cities_set.empty?
# use $location from earlier
$gset = (new \Basho\Riak\Command\Builder\FetchSet($riak))
    ->atLocation($location)
    ->build()
    ->execute()
    ->getSet();

count($gset->getData());
len(gset) == 0
// https://github.com/basho/riak-dotnet-client/blob/develop/src/RiakClientExamples/Dev/Using/DataTypes.cs

var builder = new FetchGSet.Builder()
    .WithBucketType("gsets")
    .WithBucket("account-12345678")
    .WithKey("2019-11-17");

FetchGSet fetchGSetCommand = builder.Build();
RiakResult rslt = client.Execute(fetchGSetCommand);
GSetResponse response = fetchGSetCommand.Response;
// response.Value will be null
var options = {
    bucketType: 'gsets',
    bucket: 'account-12345678',
    key: '2019-11-17'
};
client.fetchSet(options, function (err, rslt) {
    if (err) {
        throw new Error(err);
    }

    if (rslt.notFound) {
        logger.info("gset '2019-11-17' is not found!");
    }
});
riakc_gset:size(20191117Gset) == 0.

%% Query functions like size/1, is_element/2, and fold/3 operate over
%% the immutable value fetched from the server. In the case of a new
%% gset that was not fetched, this is an empty collection, so the size
%% is 0.
curl http://localhost:8098/types/gsets/buckets/account-12345678/datatypes/2019-11-17

# Response
{"type":"set","error":"notfound"}

Add to a GSet

But let’s say that a pair of transactions occurred today. Let’s add them to our 2019-11-17 set:

// Using our "cities" Location from above:

GSetUpdate su = new GSetUpdate()
        .add("Toronto")
        .add("Montreal");
UpdateSet update = new UpdateSet.Builder(citiesSet, su)
        .build();
client.execute(update);
cities_set.add('Toronto')
cities_set.add('Montreal')
# use $location from earlier
$response = (new \Basho\Riak\Command\Builder\UpdateSet($riak))
    ->add('transaction a')
    ->add('transaction b')
    ->atLocation($location)
    ->withParameter('returnbody', 'true')
    ->build()
    ->execute();
gset.add('transaction a')
gset.add('transaction b')
// https://github.com/basho/riak-dotnet-client/blob/develop/src/RiakClientExamples/Dev/Using/DataTypes.cs

var adds = new[] { "transaction a", "transaction b" };

var builder = new UpdateGSet.Builder()
    .WithBucketType("gsets")
    .WithBucket("account-12345678")
    .WithKey("2019-11-17")
    .WithAdditions(adds);

UpdateGSet cmd = builder.Build();
RiakResult rslt = client.Execute(cmd);
GSetResponse response = cmd.Response;
Assert.Contains("transaction a", response.AsStrings.ToArray());
Assert.Contains("transaction b", response.AsStrings.ToArray());
var options = {
    bucketType: 'gsets',
    bucket: 'account-1234578',
    key: '2019-11-17'
};
var cmd = new Riak.Commands.CRDT.UpdateGSet.Builder()
    .withBucketType(options.bucketType)
    .withBucket(options.bucket)
    .withKey(options.key)
    .withAdditions(['transaction a', 'transaction b'])
    .withCallback(
        function (err, rslt) {
            if (err) {
                throw new Error(err);
            }
        }
    )
    .build();
client.execute(cmd);
20191117Gset1 = riakc_gset:add_element(<<"transaction a">>, 20191117Gset),
20191117Gset2 = riakc_gset:add_element(<<"transaction b">>, 20191117Gset1).
curl -XPOST http://localhost:8098/types/gsets/buckets/account-12345678/datatypes/2019-11-17 \
  -H "Content-Type: application/json" \
  -d '{"add_all":["transaction a", "transaction b"]}'

Remove from a GSet

Removal from a GSet is not possible.

Retrieve a GSet

Now, we can check on which transactions are currently in our gset:

// Using our "cities" Location from above:

FetchSet fetch = new FetchSet.Builder(citiesSet)
        .build();
FetchSet.Response response = client.execute(fetch);
Set<BinaryValue> binarySet = response.getDatatype().view();
for (BinaryValue city : binarySet) {
  System.out.println(city.toStringUtf8());
}
cities_set.members

#<Set: {"Hamilton", "Ottawa", "Toronto"}>
# use $location from earlier
$gset = (new \Basho\Riak\Command\Builder\FetchSet($riak))
    ->atLocation($location)
    ->build()
    ->execute()
    ->getSet();

var_dump($gset->getData());
gset.dirty_value

# The value fetched from Riak is always immutable, whereas the "dirty
# value" takes into account local modifications that have not been
# sent to the server. For example, where the call above would return
# frozenset(['Transaction a', 'Transaction b']), the call below would
# return frozenset([]).

gset.value

# To fetch the value stored on the server, use the call below. Note
# that this will clear any unsent additions.
gset.reload()
// https://github.com/basho/riak-dotnet-client/blob/develop/src/RiakClientExamples/Dev/Using/DataTypes.cs

foreach (var value in GSetResponse.AsStrings)
{
    Console.WriteLine("2019-11-17 Transactions: {0}", value);
}

// Output:
// 2019-11-17 Transactions: transaction a
// 2019-11-17 Transactions: transaction b
var options = {
    bucketType: 'gsets',
    bucket: 'account-12345678',
    key: '2019-11-17'
};
client.fetchSet(options, function(err, rslt) {
    if (err) {
        throw new Error(err);
    }

    logger.info("2019-11-17 gset values: '%s'",
        rslt.values.join(', '));
});

// Output:
// info: 2019-11-17 gset values: 'transaction a, transaction b'
riakc_gset:dirty_value(20191117Gset3).

%% The value fetched from Riak is always immutable, whereas the "dirty
%% value" takes into account local modifications that have not been
%% sent to the server. For example, where the call above would return
%% [<<"Hamilton">>, <<"Ottawa">>, <<"Toronto">>], the call below would
%% return []. These are essentially ordsets:

riakc_gset:value(20191117Gset3).

%% To fetch the value stored on the server, use the call below:

{ok, SetX} = riakc_pb_socket:fetch_type(Pid,
                                        {<<"gsets">>,<<"account-12345678">>},
                                         <<"20191117">>).
curl http://localhost:8098/types/gsets/buckets/account-12345678/datatypes/2019-11-17

# Response
{"type":"set","value":["transaction a","transaction b"]}

Find GSet Member

Or we can see whether our gset includes a specific member:

// Using our "citiesSet" from above:

FetchSet fetch = new FetchSet.Builder(citiesSet)
        .build();
FetchSet.Response response = client.execute(fetch);
Set<BinaryValue> binarySet = response.getDatatype().view();

System.out.println(binarySet.contains(BinaryValue.createFromUtf8("Vancouver")));
System.out.println(binarySet.contains(BinaryValue.createFromUtf8("Ottawa")));
cities_set.include? 'Vancouver'
# false

cities_set.include? 'Ottawa'
# true
in_array('transaction z', $gset->getData()); # false

in_array('transaction a', $gset->getData()); # true
'transaction c' in gset
# False

'transaction a' in gset
# True
// https://github.com/basho/riak-dotnet-client/blob/develop/src/RiakClientExamples/Dev/Using/DataTypes.cs

using System.Linq;

bool includesTransactionZ = response.AsStrings.Any(v => v == "transaction z");
bool includesTransactionA = response.AsStrings.Any(v => v == "transaction a");
// Use standard javascript array method indexOf()

var 2019-11-17_gset = result.values;
2019-11-17_gset.indexOf('transaction z'); // if present, index is >= 0
2019-11-17_gset.indexOf('transaction a'); // if present, index is >= 0
%% At this point, GSet3 is the most "recent" set from the standpoint
%% of our application.

riakc_gset:is_element(<<"transaction z">>, 20191117Gset3).
riakc_gset:is_element(<<"transaction a">>, 20191117Gset3).
# With the HTTP interface, this can be determined from the output of
# a fetch command like the one displayed in the example above

Size of GSet

We can also determine the size of the gset:

// Using our "citiesSet" from above:

int numberOfCities = citiesSet.size();
cities_set.members.length
count($gset->getData());
len(gset)
// https://github.com/basho/riak-dotnet-client/blob/develop/src/RiakClientExamples/Dev/Using/DataTypes.cs

using System.Linq;

// Note: this enumerates the IEnumerable
gsetResponse.Values.Count();
// Use standard javascript array property length

var 2019-11-17_gset_size = result.values.length;
riakc_gset:size(20191117Gset3).
# With the HTTP interface, this can be determined from the output of
# a fetch command like the one displayed in the example above