Redis Err Not All 16384 Slots Are Covered By Nodes

We are not able to use the redis-trib fix command to fix a cluster when the master and slave for a particular set of slots both go down at the same time.

Redis Cluster data sharding¶. Redis Cluster does not use consistency hashing, but a different form of sharding where every key is conceptually part of what we call an hash slot. There are 16384 hash slots in Redis Cluster, and to compute what is the hash slot of a given key, we simply take the CRC16 of the key modulo 16384. Redis: Not all 16384 slots are covered by nodes. 08-11 阅读数 4628 启动redis cluster后运行check 检查到错误root@192.168.56.160 redisredis-trib.rb check 127.0.0.1:7000 Perfor.

Oct 11, 2014 OK All 16384 slots covered. But it doesn't give a hint that which nodeip:port doesn't agree and this time, i can't use redis-trib.rb fix ip:port to fix it, and its return is the same as reshard as shown above. But the clusterstate is ok.

redis 3.2.11
redis-cli 4.0.1
redis-trib (redis 3.3.3 gem)

Our use case is we are writing a redis cluster orchestrator, where nodes are added and removed often. When a node goes down, we want slots to be covered as quickly as possible by other masters in the cluster. Also, we are only using redis as a cache right now, so we don't necessarily care that assigning slots to the another master results in data loss.

Steps to reproduce

Redis err not all 16384 slots are covered by nodes located
  1. 3 masters, 1 slave per master
  2. Choose a master/slave pair (which covers a single set of slots) and delete both VM hosts at the same time
  3. Run /bin/redis-cli -p [port] -h [host] cluster forget [nodeIdOfDeletedMaster] on all other active nodes in the cluster
  4. /bin/redis-trib check [host:port]

Result:

redis-trib check shows missing slots message (as expected):

What is the recommended way to recover in this situation? Both rebalance and fix do not work for us. Should we be using addslots to manually assign slots to other masters?

Result of rebalance:
Rebalance is not allowed to run until we fix the cluster.

Output of rebalance:

Fix distributes slots to both master and slave IPs (which surprised us, as we thought it would only use masters). After fix completes, the nodes don't agree about configuration, and gossip does not correct the disagreement, which is an even worse situation for the cluster. The issue is easy for us to reproduce.

Is there an issue with fix, or is it not correct to attempt to use it in our use case where it is possible that slots will be lost because both the master and its backup slaves will all be lost at once. (We noticed in this issue #3007 (comment) that it was mentioned that fix will not fix this situation anyway). In our use case, we have nodes coming and going often, so we do not always know when a set of slots is going away completely.

Output of fix (truncated):

早些时间公司Redis集群环境的某台机子冗机了,同时还导致了部分slot数据分片丢失;

在用check检查集群运行状态时,遇到错误;

[root@node01 src]# ./redis-trib.rb check172.168.63.202:7000


Nodes

Connecting to node 172.168.63.202:7000: OK

Connecting to node 172.168.63.203:7000: OK

Connecting to node 172.168.63.201:7000: OK

>>> Performing Cluster Check(using node 172.168.63.202:7000)

M: 449de2d2a4b799ceb858501b5b78ab91504c72e0172.168.63.202:7000

slots: (0 slots) master

0additional replica(s)

M: db9d26b1d15889ad2950382f4f32639606f9a94b172.168.63.203:7000

slots: (0 slots) master

0additional replica(s)

M: f90924f71308eb434038fc8a5f481d3661324792172.168.63.201:7000

slots: (0 slots) master

Redis Err Not All 16384 Slots Are Covered By Nodes In One

0additional replica(s)

[OK] All nodes agree about slotsconfiguration.

>>> Check for open slots...

>>> Check slots coverage...

[ERR] Not all 16384 slots are covered by nodes.


Redis Err Not All 16384 Slots Are Covered By Nodes In Back

原因:

这个往往是由于主node移除了,但是并没有移除node上面的slot,从而导致了slot总数没有达到16384,其实也就是slots分布不正确。以在删除节点的时候一定要注意删除的是否是Master主节点。

1)、官方是推荐使用redis-trib.rb fix 来修复集群…. …. 通过cluster nodes看到7001这个节点被干掉了… 那么

[root@node01 src]# ./redis-trib.rb fix 172.168.63.201:7001


Redis Err Not All 16384 Slots Are Covered By Nodes In Women

修复完成后再用check命令检查下是否正确

[root@node01 src]# ./redis-trib.rb check172.168.63.202:7000

Redis Err Not All 16384 Slots Are Covered By Nodes Located

只要输入任意集群中节点即可,会自动检查所有相关节点。可以查看相应的输出看下是否是每个Master都有了slots,如果分布不均匀那可以使用下面的方式重新分配slot:

Redis Err Not All 16384 Slots Are Covered By Nodes In Children

Redis err not all 16384 slots are covered by nodes in the body

Redis Err Not All 16384 Slots Are Covered By Nodes In Adults

[root@node01 src]# ./redis-trib.rb reshard 172.168.63.201:7001