multipathd: "san_path_err" failure optimization
authorChongyun Wu <wu.chongyun@h3c.com>
Wed, 2 Oct 2019 06:48:07 +0000 (06:48 +0000)
committerChristophe Varoqui <christophe.varoqui@opensvc.com>
Wed, 2 Oct 2019 07:13:24 +0000 (09:13 +0200)
Let san_path_err_recovery_time path unstable can be
detected and not reinstate it until this path keep up in
san_path_err_recovery_time. It will fix heavy IO delay
caused by parts of paths state shaky in multipath device.

Test and result:
Run up eth1 30s and down eth1 30s with 100 loops script to
make some paths shaky in each multipath devices.
Using below multipath.conf configure in defaults section:
    san_path_err_recovery_time 30
    san_path_err_threshold 2
    san_path_err_forget_rate 6
After test, not found any IO delay logs except several logs in the very
beginning which before san_path_err filter shaky path works .
If without above config and this patch there will be lots of IO delay
in syslog and some paths state change from up to down again and again.

Signed-off-by: Chongyun Wu <wu.chongyun@h3c.com>
multipathd/main.c

index 70172d7..34a5768 100644 (file)
@@ -1896,6 +1896,18 @@ static int check_path_reinstate_state(struct path * pp) {
                        goto reinstate_path;
                }
                get_monotonic_time(&curr_time);
+
+               /* If path became failed again or continue failed, should reset
+                * path san_path_err_forget_rate and path dis_reinstate_time to
+                * start a new stable check. 
+                */
+               if ((pp->state != PATH_UP) && (pp->state != PATH_GHOST) &&
+                       (pp->state != PATH_DELAYED)) {
+                       pp->san_path_err_forget_rate =
+                               pp->mpp->san_path_err_forget_rate;
+                       pp->dis_reinstate_time = curr_time.tv_sec;
+               }
+
                if ((curr_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
                        condlog(2,"%s : reinstate the path after err recovery time", pp->dev);
                        goto reinstate_path;
@@ -2066,6 +2078,11 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
                pathinfo(pp, conf, 0);
                pthread_cleanup_pop(1);
                return 1;
+       } else if ((newstate != PATH_UP && newstate != PATH_GHOST) &&
+                       (pp->state == PATH_DELAYED)) {
+               /* If path state become failed again cancel path delay state */
+               pp->state = newstate;
+               return 1;
        }
        if (!pp->mpp) {
                if (!strlen(pp->wwid) &&