multipath-tools/.git
4 weeks agomultipath-tools: add GPLv2 as COPYING master
Xose Vazquez Perez [Tue, 27 Mar 2018 18:28:19 +0000 (20:28 +0200)]
multipath-tools: add GPLv2 as COPYING

Source directly from: https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt

Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: device-mapper development <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
4 weeks agomultipath-tools: move COPYING to COPYING.LESSER
Xose Vazquez Perez [Tue, 27 Mar 2018 18:28:18 +0000 (20:28 +0200)]
multipath-tools: move COPYING to COPYING.LESSER

As recommended by FSF: https://www.gnu.org/licenses/gpl-howto.html

Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: device-mapper development <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
4 weeks agomultipath: fix rcu thread cancellation hang
Benjamin Marzinski [Fri, 23 Mar 2018 20:00:46 +0000 (15:00 -0500)]
multipath: fix rcu thread cancellation hang

While the rcu code is waiting for a grace period to elapse, no threads
can register or unregister as rcu reader threads. If for some reason, a
thread never calls put_multipath_config() to exit a read side critical
section, then any threads trying to start or stop will hang. This can
happen if a thread is cancelled between calls to get_multipath_config()
and put_multipath_config(), and multipathd is reconfigured (which causes
the rcu code to wait for a grace period).

This patch fixes this issue in two ways. Where possible, it reorders the
code or saves config values into local variables to remove cancellation
points between calls to get_multipath_config() and
put_multipath_config().  In cases where this isn't possible (or where it
would cause a significant amount of extra work to be done) multipath now
pushes a cleanup handler to call put_multipath_config().

The only functions that were not modified were ones that were only
called by multipath or mpathpersist, since these are single threaded
and already disable rcu thread registration.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agomultipathd: register threads that use rcu calls
Benjamin Marzinski [Fri, 23 Mar 2018 20:00:45 +0000 (15:00 -0500)]
multipathd: register threads that use rcu calls

All calls to condlog() are rcu reader side calls, so any thread that
uses condlog() must register itself. The only threads that are exempt
are log_thread, since it never calls condlog (or any other function that
calls get_multipath_config) and mpath_pr_event_handler_fn, which is only
called by mpath_persist, which disables the rcu handling.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agolibmultipath: Fix recently introduced inconsistencies
Bart Van Assche [Mon, 19 Mar 2018 20:27:02 +0000 (21:27 +0100)]
libmultipath: Fix recently introduced inconsistencies

Commit 48e9fd9f67bb changed libmultipath such that an int is passed
as the second argument to some print_*() calls and a pointer to
other print_*() calls. Fix these inconsistencies by changing all
call-by-reference calls into call-by-value calls.

Fixes: 48e9fd9f67bb ("libmultipath: parser: use call-by-value for "snprint" methods")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
4 weeks agoAllow the compiler to verify consistency of declarations and definitions
Bart Van Assche [Mon, 19 Mar 2018 16:23:50 +0000 (09:23 -0700)]
Allow the compiler to verify consistency of declarations and definitions

Make sure that in every source file the header file is included that
declares the functions defined in that source file. This allows the
compiler to detect inconsistencies between source and header files.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
4 weeks agomultipathd: stop waiter in __setup_multipath
Benjamin Marzinski [Fri, 16 Mar 2018 21:31:07 +0000 (16:31 -0500)]
multipathd: stop waiter in __setup_multipath

__setup_multipath can remove a multipath device from multipathd, and it
can be called by either by the waiter thread or another thread.
Previously, it dealt with this by never stopping the waiter thread.  It
simply relied on the waiter thread to notice and stop itself.  Now, when
called by another thread, it explicitly stops the waiter thread.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agomultipathd: move __setup_multipath to multipathd
Benjamin Marzinski [Fri, 16 Mar 2018 21:31:06 +0000 (16:31 -0500)]
multipathd: move __setup_multipath to multipathd

__setup_multipath is only called from multipathd, so it shouldn't be in
libmultipath.  Move it, update_multpath (which calls it) and
set_no_path_retry (which is a helper function for it) into multipathd.
None of these functions were changed, only copied.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agomultipathd: fix waiter thread cancelling
Benjamin Marzinski [Fri, 16 Mar 2018 21:31:05 +0000 (16:31 -0500)]
multipathd: fix waiter thread cancelling

multipathd was sending a signal to the per-device waiter thread after
cancelling it.  Since these threads are detached, there is a window
where a new thread could have started with the old thread id between the
cancel and the signalling, causing the signal to be delivered to the
wrong thread. Simply reversing the order doesn't fix the issue, since
the waiter threads exit immediately if they receive a signal, again
opening a window for the cancel to be delivered to the wrong thread.

To fix this, multipathd does reverse the order, so that it signals the
thread first (it needs to signal the thread, since the dm_task_run ioctl
isn't a cancellation point) and then cancels it. However it does this
while holding a new mutex.

The waiter thread can only exit without being cancelled for two reasons.
1. When it fails in update_multipath, which removes the device while
   holding the vecs lock.
2. If it receives a SIGUSR2 signal while waiting for a dm event.

Case 1 can never race with another thread removing the device, since
removing a device always happens while holding the vecs lock.  This
means that if the device exists to be removed, then the waiter thread
can't exit this way during the removal.

Case 2 is now solved by grabbing the new mutex after failing
dm_task_run(). With the mutex held, the thread checks if it has been
cancelled. If it wasn't cancelled, the thread continues.

The reason that this uses a new mutex, instead of the vecs lock, is that
using the vecs lock would keep the thread from ending until the vecs
lock was released.  Normally, this isn't a problem. But during
reconfigure, the waiter threads for all devices are stopped, and new
ones started, all while holding the vecs lock.  For systems with a large
number of multipath devices, this will cause multipathd do have double
its already large number of waiter threads during reconfigure, all locked
into memory.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agomultipath-tools: remove DF arrays from HP
Xose Vazquez Perez [Thu, 15 Mar 2018 17:43:22 +0000 (18:43 +0100)]
multipath-tools: remove DF arrays from HP

Matthias did confirm that there are no such devices.

Cc: Matthias Rudolph <Matthias.Rudolph@hitachivantara.com>
Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: device-mapper development <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
4 weeks agomultipath: add unit tests for dmevents code
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:45 +0000 (12:46 -0500)]
multipath: add unit tests for dmevents code

These unit tests do not get complete code coverage. Also, they don't
check for memory errors. To do this through unit tests, instead of
using valgrid, would require adding unit test specific compilation
defines to the code, and compiling a seperate unit-test version.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agomultipathd: add new polling dmevents waiter thread
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:44 +0000 (12:46 -0500)]
multipathd: add new polling dmevents waiter thread

The current method of waiting for dmevents on multipath devices involves
creating a seperate thread for each device. This can become very
wasteful when there are large numbers of multipath devices. Also, since
multipathd needs to grab the vecs lock to update the devices, the
additional threads don't actually provide much parallelism.

The patch adds a new method of updating multipath devices on dmevents,
which uses the new device-mapper event polling interface. This means
that there is only one dmevent waiting thread which will wait for events
on all of the multipath devices.  Currently the code to get the event
number from the list of device names and to re-arm the polling interface
is not in libdevmapper, so the patch does that work. Obviously, these
bits need to go into libdevmapper, so that multipathd can use a standard
interface.

I haven't touched any of the existing event waiting code, since event
polling was only added to device-mapper in version 4.37.0.  multipathd
checks this version, and defaults to using the polling code if
device-mapper supports it. This can be overridden by running multipathd
with "-w", to force it to use the old event waiting code.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agolibmultipath: add helper functions
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:43 +0000 (12:46 -0500)]
libmultipath: add helper functions

Add the ability to reset a vector without completely freeing it, and to
check the version of the device-mapper module.  The existing version
checking code checks the version of a specific device mapper target, and
has been renamed for clarity's sake. These functions will be used in a
later patch.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agocall start_waiter_thread() before setup_multipath()
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:42 +0000 (12:46 -0500)]
call start_waiter_thread() before setup_multipath()

If setup_multipath() is called before the waiter thread has started,
there is a window where a dm event can occur between when
setup_multipath() updates the device state and when the waiter thread
starts waiting for new events, causing the new event to be missed and
the multipath device to not get updated.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agomove waiter code from libmultipath to multipathd
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:41 +0000 (12:46 -0500)]
move waiter code from libmultipath to multipathd

Only multipathd uses the code in waiter.[ch] and the functions that call
it directly, so they should all live in the multipathd directory.  This
patch is simply moving the waiter.[ch] files and the functions in
structs_vec that use them. None of the moved code has been changed.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agolibmultipath: move remove_map waiter code to multipathd
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:40 +0000 (12:46 -0500)]
libmultipath: move remove_map waiter code to multipathd

Only multipathd needs to worry about the multipath waiter code. There is
no point in having remove_map_and_stop_waiter() or
remove_maps_and_stop_waiters() in libmultipath, since they should never
be use outside of multipathd.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agomultipathd: use nanosleep for strict timing
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:39 +0000 (12:46 -0500)]
multipathd: use nanosleep for strict timing

In order to safely use SIGALRM in a multi-threaded program, only one
thread can schedule and wait on SIGALRM at a time. All other threads
must have SIGALRM blocked, and be unable to schedule an alarm. The
strict_timing code in checkerloop was unblocking SIGALRM, and calling
setitimer(), without any locking.  Instead, it should use nanosleep()
to sleep for the correct length of time, since that doesn't depend or
interfere with signals.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agolibmultipath: fix log_pthread processing
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:38 +0000 (12:46 -0500)]
libmultipath: fix log_pthread processing

log_pthread() was waiting for notification on logev_cond, without
checking if it had already happened.  This means it could end up
waiting, while there is work that it should be doing.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agomultipathd: log thread cleanup
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:37 +0000 (12:46 -0500)]
multipathd: log thread cleanup

The function log_thread_flush() is an exact copy of flush_lgoqueue(), so
both clearly don't need to exist. Also, There is no reason to make all
of the log thread variables global.  The only time any of them were
being used outside of log_thread.c, was to reset the log.  This code
should never be run if the log_thread isn't running, so it makes more
sense to live inside of log_thread.c

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agolibmultipath: set dm_conf_verbosity
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:36 +0000 (12:46 -0500)]
libmultipath: set dm_conf_verbosity

dm_conf_verbosity was created to keep dm_write_log from needing
access to the multipath config. However it never was set.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agolibmultipath: fix basenamecpy
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:35 +0000 (12:46 -0500)]
libmultipath: fix basenamecpy

basenamecpy was returning the wrong answer in multiple cases, as
shown by the unit tests for it. Now it will properly find the
basename (as defined by GNU basename, which works well for all of
multipath's uses) and return a copy, if the basename can fit in
provided buffer.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agoUnit tests for basenamecpy
Benjamin Marzinski [Wed, 14 Mar 2018 17:46:34 +0000 (12:46 -0500)]
Unit tests for basenamecpy

The current implementation of basenamecpy is broken, so some of these
tests currently fail. Fixes to follow.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
4 weeks agomultipath-tools: fix errors in auto generated man pages
Xose Vazquez Perez [Mon, 12 Mar 2018 20:52:43 +0000 (21:52 +0100)]
multipath-tools: fix errors in auto generated man pages

dmmp_path_blk_name_get.3:
<standard input>:14: warning: macro `nvme0n1'.' not defined

dmmp_mpath_kdev_name_get.3:
<standard input>:15: warning: macro `dm-1'.' not defined

Cc: Gris Ge <fge@redhat.com>
Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: device-mapper development <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
6 weeks agoBump version to 0.7.6 0.7.6
Christophe Varoqui [Sat, 10 Mar 2018 07:28:38 +0000 (08:28 +0100)]
Bump version to 0.7.6

6 weeks agomultipath-tools: fix misspellings
Xose Vazquez Perez [Thu, 8 Mar 2018 23:08:48 +0000 (00:08 +0100)]
multipath-tools: fix misspellings

Done with https://github.com/lucasdemarchi/codespell

Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: device-mapper development <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
6 weeks agomultipath-tools: refresh kernel-doc from kernel sources
Xose Vazquez Perez [Thu, 8 Mar 2018 19:34:40 +0000 (20:34 +0100)]
multipath-tools: refresh kernel-doc from kernel sources

Bugs fixed and "get rid of unused output formats"
b05142675310d2ac80276569e151742f880e3ec3:

Since there isn't any docbook code anymore upstream,
we can get rid of several output formats:

- docbook/xml, html, html5 and list formats were used by
  the old build system;
- As ReST is text, there's not much sense on outputting
  on a different text format.

After this patch, only man and rst output formats are
supported.

Cc: Gris Ge <fge@redhat.com>
Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: device-mapper development <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
6 weeks agomultipath.conf(5): improve syntax documentation
Martin Wilck [Wed, 7 Mar 2018 23:26:20 +0000 (00:26 +0100)]
multipath.conf(5): improve syntax documentation

Describe the syntax of attribute / value pairs, comments, and quoted
strings, as well as the peculiarities of section beginnings and ends.
Also describe the newly added '""' feature.

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: config parser: fix corner case for double quotes
Martin Wilck [Wed, 7 Mar 2018 23:26:19 +0000 (00:26 +0100)]
libmultipath: config parser: fix corner case for double quotes

A corner case of the previous patch are strings starting with a double quote,
such as '"prepended to itself is false" prepended to itself is false' or
'"" is the empty string', and in particular, the string '"' ("\"" in C
notation), which is indistinguishable from the "QUOTE" token in the parsed strvec.

This patch fixes that by introducing a special token that can't occur as part
of a normal string to indicate the beginning and end of a quoted string.

'"' is admittedly not a very likely keyword value for multipath.conf, but
a) this is a matter of correctness, b) we didn't think of '2.5"' before, either, and
c) the (*str != '"') expressions would need to be patched anyway to fix the
'string starting with "' case.

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: config parser: Allow '"' in strings
Martin Wilck [Wed, 7 Mar 2018 23:26:18 +0000 (00:26 +0100)]
libmultipath: config parser: Allow '"' in strings

We have seen model strings lile '2.5" SSD' which can't be parsed
by the current config parser. This patch fixes this by allowing
'""' to represent a double quote character inside a a string.
The above model string could now be entered in the config file like this:

blacklist {
  vendor SomeCorp
  product "2.5"" SSD"
}

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: config parser: don't strip whitepace between quotes
Martin Wilck [Wed, 7 Mar 2018 23:26:17 +0000 (00:26 +0100)]
libmultipath: config parser: don't strip whitepace between quotes

Between double quotes, the parser currently strips leading (but not
trailing) whitespace. That's inconsistent and unexpected. Fix it.

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agotests: add unit tests for config file parser
Martin Wilck [Wed, 7 Mar 2018 23:26:16 +0000 (00:26 +0100)]
tests: add unit tests for config file parser

Add test cases for parsing the config file.

Some of these tests currently fail. The patches that follow fix them.

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: uev_update_path: update path properties
Martin Wilck [Wed, 7 Mar 2018 23:21:52 +0000 (00:21 +0100)]
libmultipath: uev_update_path: update path properties

Update pp->udev and those path attributes that can be cheaply
updated from sysfs, i.e. without IO to the disk.

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: uev_update_path: always warn if WWID changed
Martin Wilck [Wed, 7 Mar 2018 23:21:51 +0000 (00:21 +0100)]
libmultipath: uev_update_path: always warn if WWID changed

Print the warning about changed WWID not only if disable_changed_wwids
is set, but always. It's actually more dangerous if that option is
not set.

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: get_uid: don't quit prematurely without udev
Martin Wilck [Wed, 7 Mar 2018 23:21:50 +0000 (00:21 +0100)]
libmultipath: get_uid: don't quit prematurely without udev

Not all the implemented methods to derive the UID rely on udev
information being present. For example getuid callout, rbd,
and the SCSI vpd code work fine without it. It's unlikely that
we don't get udev data, but we want to be as good as possible
at deriving the uid.

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: get_uid: check VPD pages for SCSI only
Martin Wilck [Wed, 7 Mar 2018 23:21:49 +0000 (00:21 +0100)]
libmultipath: get_uid: check VPD pages for SCSI only

The VPD code won't work for non-SCSI devices, anyway. For indentation
reasons, I moved the "retrigger_tries" case to a separate function,
which is also called only for SCSI devices.

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: remove FREE_CONST() again
Martin Wilck [Wed, 7 Mar 2018 23:15:51 +0000 (00:15 +0100)]
libmultipath: remove FREE_CONST() again

The FREE_CONST macro is of questionable value, as reviewers have pointed
out. The users of this macro were mostly functions that called
uevent_get_dm_xyz(). But these functions don't need to return const char*,
as they allocate the strings they return. So my change of the prototype
was wrong. This patch reverts it. The few other users of FREE_CONST can
also be reverted to use char* instead of const char* with negligible risk.

Fixes: "libmultipath: fix compiler warnings for -Wcast-qual"
Fixes: "libmultipath: const qualifier for wwid and alias"

(Note: this reverts changes not committed upstream. But as these changes are
deeply in the middle of my large-ish series of patches, it's probably easier
to simply add this patch on top than to rebase the whole series).

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: fix wrong output of "multipath -t"
Martin Wilck [Wed, 7 Mar 2018 23:15:50 +0000 (00:15 +0100)]
libmultipath: fix wrong output of "multipath -t"

The default values printed by "multipath -t" or "multipathd show config"
for "detect_prio", "detect_checker", and "retain_attached_hw_handler"
don't match the actual compiled-in defaults. Moreover, several other
options would also be displayed wrongly if the defaults were changed.

Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agoIntroduce the ibmultipath/unaligned.h header file
Bart Van Assche [Wed, 7 Mar 2018 23:15:49 +0000 (00:15 +0100)]
Introduce the ibmultipath/unaligned.h header file

This patch avoids that Coverity reports the following for the code
in libmultipath/prioritizers/alua_rtpg.c:

   CID 173256:  Integer handling issues  (SIGN_EXTENSION)
    Suspicious implicit sign extension: "buf[0]" with type "unsigned char" (8 bits, unsigned) is promoted in "((buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | buf[3]) + 4" to type "int" (32 bits, signed), then sign-extended to type "unsigned long" (64 bits, unsigned).  If "((buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | buf[3]) + 4" is greater than 0x7FFFFFFF, the upper bits of the result will all be 1.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: Fix sgio_get_vpd()
Bart Van Assche [Wed, 7 Mar 2018 23:15:48 +0000 (00:15 +0100)]
libmultipath: Fix sgio_get_vpd()

Pass the VPD page number to sgio_get_vpd() such that the page needed
by the caller is queried instead of page 0x83. Fix the statement that
computes the length of the page returned by do_inq(). Fix the return
code check in the caller of sgio_get_vpd().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agokpartx: Improve reliability of find_loop_by_file()
Bart Van Assche [Wed, 7 Mar 2018 23:15:47 +0000 (00:15 +0100)]
kpartx: Improve reliability of find_loop_by_file()

Avoid that the strchr() call in this function examines uninitialized
data on the stack. This patch avoids that Coverity reports the following:

    CID 173252:  Error handling issues  (CHECKED_RETURN)
    "read(int, void *, size_t)" returns the number of bytes read, but it is ignored.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath, alloc_path_with_pathinfo(): Ensure that pp->wwid is '\0'-terminated
Bart Van Assche [Wed, 7 Mar 2018 23:15:46 +0000 (00:15 +0100)]
libmultipath, alloc_path_with_pathinfo(): Ensure that pp->wwid is '\0'-terminated

Discovered by Coverity (CID 173257).

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: enable feature disable changed wwid by default
Chongyun Wu [Wed, 7 Mar 2018 23:15:45 +0000 (00:15 +0100)]
libmultipath: enable feature disable changed wwid by default

enable feature disable changed wwid by default.

Signed-off-by: Chongyun Wu <wu.chongyun@h3c.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agomultipathd: add lock protection for cli_list_status
Chongyun Wu [Wed, 7 Mar 2018 23:15:44 +0000 (00:15 +0100)]
multipathd: add lock protection for cli_list_status

cli_list_status will access vecs->pathvec which should have lock
protection, otherwise might get inconsistent data or other
problem.

Signed-off-by: Chongyun Wu <wu.chongyun@h3c.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agomultipath-tools: reformat and update comments in hwtable
Xose Vazquez Perez [Wed, 7 Mar 2018 23:10:00 +0000 (00:10 +0100)]
multipath-tools: reformat and update comments in hwtable

Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: device-mapper development <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
6 weeks agomultipath-tools: move Nimble and SGI to HPE section
Xose Vazquez Perez [Wed, 7 Mar 2018 23:09:59 +0000 (00:09 +0100)]
multipath-tools: move Nimble and SGI to HPE section

They were absorbed by HPE time ago.

Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: device-mapper development <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
6 weeks agomultipath-tools: build: prevent intermediate file deletion
Martin Wilck [Wed, 7 Mar 2018 23:08:59 +0000 (00:08 +0100)]
multipath-tools: build: prevent intermediate file deletion

By default, "make" removes intermediate files from implicit rules
if they are the only dependency. Prevent that by using .SECONDARY.
Otherwise some files will be re-built upon second invocation of "make".

Fixes: e39283ebd79b "multipath-tools: add dependency tracking to Makefiles"
Reported-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agomultipath: fix clang warning in delegate_to_multipathd
Martin Wilck [Wed, 7 Mar 2018 23:08:58 +0000 (00:08 +0100)]
multipath: fix clang warning in delegate_to_multipathd

Fixes this warning from clang:

main.c:628:11: warning: variable 'reply' is used uninitialized
whenever 'if' condition is true [-Wsometimes-uninitialized]
...
main.c:609:32: note: initialize the variable 'reply' to silence this warning

Fixes: 506d253b7f89 "multipath: delegate dangerous commands to multipathd"
Reported-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agomultipathd: fix -Wpointer-to-int-cast warning in uxlsnr
Martin Wilck [Wed, 7 Mar 2018 23:08:57 +0000 (00:08 +0100)]
multipathd: fix -Wpointer-to-int-cast warning in uxlsnr

Fixes: "multipathd: release uxsocket and resource when cancel thread"
Signed-off-by: Martin Wilck <mwilck@suse.com>
6 weeks agolibmultipath: fix crash on shutdown if io_err thread isn't running
Martin Wilck [Wed, 7 Mar 2018 23:08:56 +0000 (00:08 +0100)]
libmultipath: fix crash on shutdown if io_err thread isn't running

If we've never created the io_error checker thread, we shouldn't
cancel it.

Fixes: 160da9fa4339 "multipathd: start marginal path checker thread
lazily"

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agomultipath-tools: add info about how to get a release directly from gitweb
Xose Vazquez Perez [Fri, 12 Jan 2018 16:56:41 +0000 (17:56 +0100)]
multipath-tools: add info about how to get a release directly from gitweb

gitweb is able to extract and serve a release right away.

Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: device-mapper development <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
7 weeks agoBump version to 0.7.5 0.7.5
Christophe Varoqui [Wed, 7 Mar 2018 09:52:25 +0000 (10:52 +0100)]
Bump version to 0.7.5

7 weeks agomultipathd: start marginal path checker thread lazily
Martin Wilck [Tue, 6 Mar 2018 21:18:42 +0000 (22:18 +0100)]
multipathd: start marginal path checker thread lazily

I noticed that the io_error checker thread accounts for most of the
activity of multipathd even if the marginal path checking paramters
are not set (which is still the default in most installations I assume).

Therefore, start the io_error checker thread only if there's at least
one map with marginal error path checking configured. Also, make sure
the thread is really up when start_io_err_stat_thread() returns.

This requires adding a "vecs" argument to setup_map, because vecs
needs to be passed to the io_error checking code.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: fix race in stop_io_err_stat_thread
Martin Wilck [Tue, 6 Mar 2018 21:18:41 +0000 (22:18 +0100)]
libmultipath: fix race in stop_io_err_stat_thread

It's wrong, and unnecessary, to call pthread_kill after
pthread_cancel. I have observed cases where the io_err checker
thread hung in libpthread after receiving the USR2 signal, in particular
when multipathd is run under strace. (If multipathd is killed with
SIGINT under strace, and the io_error thread is running, it happens
almost every time). If this happens, the io_err thread
tries to obtain a mutex in the urcu code (presumably rcu_unregister_thread())
and the main thread hangs in pthread_join().

With the change from this patch, the thread is shut down cleanly. I haven't
observed the hang under strace with the patch.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agomultipathd: fix signal blocking logic
Martin Wilck [Mon, 5 Mar 2018 23:15:07 +0000 (00:15 +0100)]
multipathd: fix signal blocking logic

multipathd is supposed to block all signals in all threads, except
the uxlsnr thread which handles termination and reconfiguration
signals (SIGUSR1) in its ppoll() call, SIGUSR2 in the waiter thread
and the marginal path checker thread, and occasional SIGALRM. The current
logic does exactly the oppsite, it blocks termination signals in SIGPOLL and
allows multipathd to be killed e.g. by SIGALRM.

Fix that by inverting the logic. The argument to pthread_sigmask and
ppoll is the set of *blocked* signals, not vice versa.

The marginal paths code needs to unblock SIGUSR2 now explicity, as
the dm-event waiter code already does. Doing this with pselect()
avoids asynchronous cancellation.

Fixes: 810082e "libmultipath, multipathd: Rework SIGPIPE handling"
Fixes: 534ec4c "multipathd: Ensure that SIGINT, SIGTERM, SIGHUP and SIGUSR1
are delivered to the uxsock thread"

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agomultipathd: update path group prio in check_path
Martin Wilck [Mon, 5 Mar 2018 23:15:06 +0000 (00:15 +0100)]
multipathd: update path group prio in check_path

The previous patch "libmultipath: don't update path groups when printing"
removed the call to path_group_prio_update() in the printing code path.
To compensate for that, recalculate path group prio also when it's not
strictly necessary (i.e. if failback "manual" is set).

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: foreign/nvme: implement path display
Martin Wilck [Mon, 5 Mar 2018 23:15:05 +0000 (00:15 +0100)]
libmultipath: foreign/nvme: implement path display

implement display of path information for NVMe foreign paths and maps.
With this patch, I get output like this for Linux NVMe soft targets:

nvme-submultipathd show topology
sys0:NQN:subsysname (uuid.96926ba3-b207-437c-902c-4a4df6538c3f) [nvme] nvme0n1 NVMe,Linux,4.15.0-r
size=2097152 features='n/a' hwhandler='n/a' wp=rw
`-+- policy='n/a' prio=n/a status=n/a
  |- 0:1:1 nvme0c1n1 0:0 n/a n/a live
  |- 0:2:1 nvme0c2n1 0:0 n/a n/a live
  |- 0:3:1 nvme0c3n1 0:0 n/a n/a live
  `- 0:4:1 nvme0c4n1 0:0 n/a n/a live

multipathd show paths format '%G %d %i %o %z %m %N'
foreign dev       hcil  dev_st serial           multipath host WWNN
[nvme]  nvme0c1n1 0:1:1 live   1c2c86659503a02f nvme0n1   rdma:traddr=192.168.201.101,trsvcid=4420
[nvme]  nvme0c2n1 0:2:1 live   1c2c86659503a02f nvme0n1   rdma:traddr=192.168.202.101,trsvcid=4420
[nvme]  nvme0c3n1 0:3:1 live   1c2c86659503a02f nvme0n1   rdma:traddr=192.168.203.101,trsvcid=4420
[nvme]  nvme0c4n1 0:4:1 live   1c2c86659503a02f nvme0n1   rdma:traddr=192.168.204.101,trsvcid=4420

(admittedly, I abused the 'WWNN' wildcard here a bit to display information
which is helpful for NVMe over RDMA).

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agomultipathd: use foreign API
Martin Wilck [Mon, 5 Mar 2018 23:15:04 +0000 (00:15 +0100)]
multipathd: use foreign API

Call into the foreign library code when paths are discovered, uevents
are received, and in the checker loop. Furthermore, use the foreign
code to print information in the "multipathd show paths", "multipathd
show maps", and "multipathd show topology" client commands.

We don't support foreign data in the individual "show map" and "show path"
commands, and neither in the "json" commands. The former is a deliberate
decision, the latter could be added if desired.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agomultipath: use foreign API
Martin Wilck [Mon, 5 Mar 2018 23:15:03 +0000 (00:15 +0100)]
multipath: use foreign API

Use the "foreign" code to print information about multipath maps
owned by foreign libraries in print mode (multipath -ll, -l).

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: pathinfo: call into foreign library
Martin Wilck [Mon, 5 Mar 2018 23:15:02 +0000 (00:15 +0100)]
libmultipath: pathinfo: call into foreign library

This actually enables the use of foreign paths.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath/foreign: nvme foreign library
Martin Wilck [Mon, 5 Mar 2018 23:15:01 +0000 (00:15 +0100)]
libmultipath/foreign: nvme foreign library

This still contains stubs for path handling and checking, but it's functional
for printing already.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath/print: add "%G - foreign" wildcard
Martin Wilck [Mon, 5 Mar 2018 23:15:00 +0000 (00:15 +0100)]
libmultipath/print: add "%G - foreign" wildcard

This adds a format field to identify foreign maps as such, and
uses it in default-formatted topology output (generic_style()).

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath: API for foreign multipath handling
Martin Wilck [Mon, 5 Mar 2018 23:14:59 +0000 (00:14 +0100)]
libmultipath: API for foreign multipath handling

Add an API for "foreign" multipaths. Foreign libraries are loaded
from ${multipath_dir}/libforeign-*.so, as we do for checkers.

Refer to "foreign.h" for details about the API itself. Like we do for
checkers, high-level multipath code isn't supposed to call the API directly,
but rather the wrapper functions declared in "foreign.h".

This API is used only for displaying information and for logging. An extension to
other functionality (such as monitoring or administration) might be feasible,
but is not planned.

Foreign libraries communicate with libmultipath through the API defined in
"foreign.h". The foreign library can implement multipath maps, pathgroups,
and paths as it likes, they just need to provide the simple interfaces
defined in "generic.h" to libmultipath. These interfaces are used in libmultipath's
"print" implementation to convey various bits of information to users. By
using the same interfaces for printing that libmultipath uses internally,
foreign library implementations can focus on the technical side without
worrying about output formatting compatibility.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: print: use generic API for get_x_layout()
Martin Wilck [Mon, 5 Mar 2018 23:14:58 +0000 (00:14 +0100)]
libmultipath: print: use generic API for get_x_layout()

Introduce new functions _get_path_layout and _get_multipath_layout
using the new "generic" API to determine field widths, and map the
old API to them.

Furthermore, replace the boolean "header" by an enum with 3 possible
values. The new value LAYOUT_RESET_NOT allows calling the get_x_layout
function several times and determine the overall field width.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath: print: convert API to generic data type
Martin Wilck [Mon, 5 Mar 2018 23:14:57 +0000 (00:14 +0100)]
libmultipath: print: convert API to generic data type

Convert higher level API (snprint_multipath_topology() etc) to
using the generic multipath API. This will allow "foreign"
multipath objects that implement the generic API to be printed
exactly like native multipathd objects.

The previous API (using "struct multipath*" and "struct path" remains
in place through macros mapping to the new functions. By doing this
and testing in regular setups, it's easily verified that the new
API works and produces the same results.

Moreover, abstract out the code to determine the output format from multipath
properties into snprint_multipath_style(), to be able to use it as generic
->style() method.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: "generic multipath" interface
Martin Wilck [Mon, 5 Mar 2018 23:14:56 +0000 (00:14 +0100)]
libmultipath: "generic multipath" interface

This patch adds a simplified abstract interface to the multipath data structures.
The idea is to allow "foreign" data structures to be treated by libmultipath
if they implement the same interface. Currently, the intention is to use this
only to provide formatted output about from this interface.

This interface assumes only that the data structure is organized in maps
containing path groups containing paths, and that formatted printing (using
the wildcards defined in libmultipath) is possible on each level of the data
structure.

The patch also implements the interface for the internal dm_multipath data
structure.

The style() method looks a bit exotic, but it's necessary because
print_multipath_topology() uses different formats depending on the mpp
properties. This needs to be in the generic interface, too, if we want to
produce identical output.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: add vector_convert()
Martin Wilck [Mon, 5 Mar 2018 23:14:55 +0000 (00:14 +0100)]
libmultipath: add vector_convert()

This is a handy helper for creating one vector from another,
mapping each element of the origin vector to an element of
the target vector with a given conversion function. It can
also be used to "concatenate" vectors by passing in a non-NULL first
argument.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath: add vector_free_const()
Martin Wilck [Mon, 5 Mar 2018 23:14:54 +0000 (00:14 +0100)]
libmultipath: add vector_free_const()

... to dispose of constant vectors (const struct _vector*).

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agomultipath-tools: Makefile.inc: use -Werror=cast-qual
Martin Wilck [Mon, 5 Mar 2018 23:14:53 +0000 (00:14 +0100)]
multipath-tools: Makefile.inc: use -Werror=cast-qual

Casting "const" away is often an indicator for wrong code.
Add a compiler flag to warn about such possible breakage.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath: fix compiler warnings for -Wcast-qual
Martin Wilck [Mon, 5 Mar 2018 23:14:52 +0000 (00:14 +0100)]
libmultipath: fix compiler warnings for -Wcast-qual

Fix the warnings that were caused by adding the -Wcast-qual compiler
flag in the previous patch.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath: use "const" in devmapper code
Martin Wilck [Mon, 5 Mar 2018 23:14:51 +0000 (00:14 +0100)]
libmultipath: use "const" in devmapper code

Improve use of "const" qualifiers in libmultipath's devmapper code.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath/print: use "const" where appropriate
Martin Wilck [Mon, 5 Mar 2018 23:14:50 +0000 (00:14 +0100)]
libmultipath/print: use "const" where appropriate

Convert the print.h/print.c code to use "const" qualifiers
properly. This is generally considered good programming practice,
and the printing code shouldn't change any objects anyway.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath: don't update path groups when printing
Martin Wilck [Mon, 5 Mar 2018 23:14:49 +0000 (00:14 +0100)]
libmultipath: don't update path groups when printing

Updating the prio values for printing makes no sense. The user wants to see
the prio values multipath is actually using for path group selection, and
updating the values here means actually lying to the user if the prio values
have changed, but multipathd hasn't updated them internally.

If we really don't update the pathgroup prios when we need to, this should be
fixed elsewhere. The current wrong output would just hide that if it occured.

Moreover, correctness forbids changing properties so deeply in a code path
that's supposed to print them only. Finally, this piece of code prevents the
print.c code to be converted to proper "const" usage.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath: parser: use call-by-value for "snprint" methods
Martin Wilck [Mon, 5 Mar 2018 23:14:48 +0000 (00:14 +0100)]
libmultipath: parser: use call-by-value for "snprint" methods

Convert the snprint methods for all keywords to call-by-value,
and use "const" qualifier for the "data" argument. This makes sure
that "snprint" type functions don't modify the data they're print,
helps compile-time correctness checking, and allows more proper
"const" cleanups in the future.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath: get rid of selector "hack" in print.c
Martin Wilck [Mon, 5 Mar 2018 23:14:47 +0000 (00:14 +0100)]
libmultipath: get rid of selector "hack" in print.c

By properly linking the path groups with their parent multipath,
we don't need this "hack" any more.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath: remove unused "stdout helpers"
Martin Wilck [Mon, 5 Mar 2018 23:14:46 +0000 (00:14 +0100)]
libmultipath: remove unused "stdout helpers"

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agomultipath(d)/Makefile: add explicit dependency on libraries
Martin Wilck [Mon, 5 Mar 2018 23:14:45 +0000 (00:14 +0100)]
multipath(d)/Makefile: add explicit dependency on libraries

Otherwise the binaries won't be re-linked if the libraries change.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agomultipath-tools: add INSPUR/MCS to hardware table
Tom Geng(耿芳忠) [Mon, 29 Jan 2018 06:04:29 +0000 (06:04 +0000)]
multipath-tools: add INSPUR/MCS to hardware table

Hi, Xose,
I sent the patch to dm-devel last week, but forgot to CC you. Please help to
review and have chance to submit to the mainline.
Thank you a lot.

From 091bae5fec22c61f0c3e6f9ab848fecae5203122 Mon Sep 17 00:00:00 2001
From: Tom Geng <gengfangzhong@inspur.com>
Date: Tue, 23 Jan 2018 15:33:09 +0800
Subject: [PATCH] multipath-tools: add INSPUR/MCS to hardware table

7 weeks agomultipath: print sysfs state in fast list mode
Benjamin Marzinski [Tue, 13 Feb 2018 03:42:14 +0000 (21:42 -0600)]
multipath: print sysfs state in fast list mode

commit b123e711ea2a4b471a98ff5e26815df3773636b5 "libmultipath: cleanup
orphan device states" stopped multipathd from showing old state for
orphan paths by checking if pp->mpp was set, and only printing the state
if it was.   Unfortunately, when "multipath -l" is run, pp->mpp isn't
set. While the checker state isn't checked and shouldn't be printed with
"-l", the sysfs state can be, and was before b123e711. This patch sets
pp->mpp in fast list mode, so that the sysfs state gets printed. It
also verifies that the path exists in sysfs, and if not, marks it as
faulty.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agomultipathd: change spurious uevent msg priority
Benjamin Marzinski [Tue, 13 Feb 2018 03:42:13 +0000 (21:42 -0600)]
multipathd: change spurious uevent msg priority

The "spurious uevent, path already in pathvec" is not anything to worry
about, so it should not have the error priority.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agoFix set_no_path_retry() regression
Benjamin Marzinski [Tue, 13 Feb 2018 03:42:12 +0000 (21:42 -0600)]
Fix set_no_path_retry() regression

commit 0f850db7fceb6b2bf4968f3831efd250c17c6138 "multipathd: clean up
set_no_path_retry" has a bug in it. It made set_no_path_retry
never reset mpp->retry_ticks, even if the device was in recovery mode,
and there were valid paths. This meant that adding new paths didn't
remove a device from recovery mode, and queueing could get disabled,
even while there were valid paths. This patch fixes that.

This patch also fixes a bug in cli_restore_queueing() and
cli_restore_all_queueing(), where a device that had no_path_retry
set to "queue" would enter recovery mode (although queueing would
never actually get disabled).

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agomultipathd: remove unused configure parameter
Benjamin Marzinski [Tue, 13 Feb 2018 03:42:11 +0000 (21:42 -0600)]
multipathd: remove unused configure parameter

configure() is always called with start_waiters=1, so there is no point
in having the parameter. Remove it.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agomultipathd: remove coalesce_paths from ev_add_map
Benjamin Marzinski [Tue, 13 Feb 2018 03:42:10 +0000 (21:42 -0600)]
multipathd: remove coalesce_paths from ev_add_map

If ev_add_map is called for a multipath device that doesn't exist in
device-mapper, it will call coalesce_paths to add it.  This doesn't work
and hasn't for years. It doesn't add the map to the mpvec, or start up
waiters, or do any of the necessary things that do get done when you
call ev_add_map for a map that does exist in device mapper.

Fortunately, there are only two things that call ev_add_map. uev_add_map
makes sure that the device does exist in device-mapper before calling
ev_add_map, and cli_add_map creates the device first and then calls
ev_add_map, if the device doesn't exist.

So, there is no reason for coalesce_paths to be in ev_add_map. This
removes it.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agomultipath: fix DEF_TIMEOUT use
Benjamin Marzinski [Tue, 13 Feb 2018 03:42:09 +0000 (21:42 -0600)]
multipath: fix DEF_TIMEOUT use

DEF_TIMEOUT is specified in seconds. The io_hdr timeout is specified in
milliseconds, so we need to convert it. Multipath should be waiting
longer than 30 milliseconds here. If there are concerns that 30 seconds
may be too long, we could always make this configurable, using
conf->checker_timeout if set.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agolibmultipath: fix tur checker locking
Benjamin Marzinski [Tue, 13 Feb 2018 03:42:08 +0000 (21:42 -0600)]
libmultipath: fix tur checker locking

Commit 6e2423fd fixed a bug where the tur checker could cancel a
detached thread after it had exitted. However in fixing this, the new
code grabbed a mutex (to call condlog) while holding a spin_lock.  To
deal with this, I've done away with the holder spin_lock completely, and
replaced it with two atomic variables, based on a suggestion by Martin
Wilck.

The holder variable works exactly like before.  When the checker is
initialized, it is set to 1. When a thread is created it is incremented.
When either the thread or the checker are done with the context, they
atomically decrement the holder variable and check its value. If it
is 0, they free the context. If it is 1, they never touch the context
again.

The other variable has changed. First, ct->running and ct->thread have
switched uses. ct->thread is now only ever accessed by the checker,
never the thread.  If it is non-NULL, a thread has started up, but
hasn't been dealt with by the checker yet. It is also obviously used
by the checker to cancel the thread.

ct->running is now an atomic variable.  When the thread is started
it is set to 1. When the checker wants to kill a thread, it atomically
sets the value to 0 and reads the previous value.  If it was 1,
the checker cancels the thread. If it was 0, the nothing needs to be
done.  After the checker has dealt with the thread, it sets ct->thread
to NULL.

Right before the thread finishes and pops the cleanup handler, it
atomically sets the value of ct->running to 0 and reads the previous
value. If it was 1, the thread just pops the cleanup handler and exits.
If it was 0, then the checker is trying to cancel the thread, and so the
thread calls pause(), which is a cancellation point.

Cc: Martin Wilck <mwilck@suse.com>
Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
7 weeks agomultipath-tools: handle exit signal immediately
Martin Wilck [Tue, 30 Jan 2018 14:16:24 +0000 (15:16 +0100)]
multipath-tools: handle exit signal immediately

multipathd shouldn't try to service any more client connections
when it receives an exit signal. Moreover, ppoll() can return
success even if signals occured. So check for reconfigure or
log_reset signals after handling pending client requests.

Based on an analysis by Chongyun Wu.

Reported-by: Chongyun Wu <wu.chongyun@h3c.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: increase path product_id/rev field size for NVMe
Martin Wilck [Fri, 19 Jan 2018 11:55:35 +0000 (12:55 +0100)]
libmultipath: increase path product_id/rev field size for NVMe

NVMe allows longer strings for the model (product) and firmware rev
than SCSI.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agomultipath-tools: add dependency tracking to Makefiles
Martin Wilck [Fri, 19 Jan 2018 00:19:44 +0000 (01:19 +0100)]
multipath-tools: add dependency tracking to Makefiles

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: ignore natively multipathed NVME devices
Martin Wilck [Fri, 19 Jan 2018 00:19:43 +0000 (01:19 +0100)]
libmultipath: ignore natively multipathed NVME devices

Such devices have a parent with SUBSYSTEM="nvme-subsystem", not "nvme".
Furthermore, avoid a possible segfaults NULL checks.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agomultipath.rules: handle NVME devices
Martin Wilck [Fri, 19 Jan 2018 00:19:42 +0000 (01:19 +0100)]
multipath.rules: handle NVME devices

Note: ID_WWN is set in 60-persistent-storage.rules in current systemd.
That won't work well together with us installing multipath.rules as
56-multipath.rules -  multipath -u won't see ID_WWN.

However, we have strong reasons to run before blkid is invoked.
ID_WWN determination for NVMe should be moved to an earlier udev rule.
See systemd pull request #7594 on github.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agomultipathd: ignore uevents for non-mpath devices
Martin Wilck [Mon, 29 Jan 2018 23:57:50 +0000 (00:57 +0100)]
multipathd: ignore uevents for non-mpath devices

multipathd can't deal with other devices anyway. Proceeding further
with events for other devices just generates log noise.

Based on an idea from Ritika Srivastava <ritika.srivastava@oracle.com>.
("multipath-tools: Skip CHANGE uevent for non-mpath devices").

Changes in v2: always return immediately for non-mpath case
  (Ritika Srivastava)

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: add uevent_is_mpath
Martin Wilck [Wed, 17 Jan 2018 07:49:38 +0000 (08:49 +0100)]
libmultipath: add uevent_is_mpath

This function can be used to test if an uevent belongs to valid
multipath device. Unit tests are also added.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: move UUID_PREFIX to devmapper.h
Martin Wilck [Wed, 17 Jan 2018 07:49:37 +0000 (08:49 +0100)]
libmultipath: move UUID_PREFIX to devmapper.h

This constant is useful elsewhere, too.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: const qualifier for wwid and alias
Martin Wilck [Wed, 17 Jan 2018 07:49:36 +0000 (08:49 +0100)]
libmultipath: const qualifier for wwid and alias

Add "const" qualifiers to some function arguments and fields,
in order to tidy up const handling of libmultipath.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agolibmultipath: refactor uevent_get_XXX
Martin Wilck [Wed, 17 Jan 2018 07:49:35 +0000 (08:49 +0100)]
libmultipath: refactor uevent_get_XXX

Use common helper functions for uevent_get_XXX.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agotests: cmocka-based unit test for uevent_get_XXX
Martin Wilck [Wed, 17 Jan 2018 07:49:34 +0000 (08:49 +0100)]
tests: cmocka-based unit test for uevent_get_XXX

This patch starts a simple unit test framework for multipath-tools
based on the cmocka framework (https://cmocka.org/). As a start,
it adds unit tests for the uevent_get_XXX set of functions.

Note that some tests currently fail. This will be fixed by the
following patches.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agoassemble_map: no newline at end of params string
Martin Wilck [Wed, 17 Jan 2018 07:49:33 +0000 (08:49 +0100)]
assemble_map: no newline at end of params string

This newline is superfluous for device-mapper, and causes ugly
debug output.

Signed-off-by: Martin Wilck <mwilck@suse.com>
7 weeks agomultipathd: release uxsocket and resource when cancel thread
Wuchongyun [Wed, 17 Jan 2018 08:15:05 +0000 (08:15 +0000)]
multipathd: release uxsocket and resource when cancel thread

Hi Matin,
I think it's a good idea to move the close(ux_sock) call further up to avoid new clients trying to connect, Below is the new patch for your comment. Please help to review. Thanks.

Issue description: we meet this issue: when multipathd initilaze and
call uxsock_listen to create unix domain socket, but return -1 and
the errno is 98 and then the uxsock_listen return null. After multipathd
startup we can't receive any user's multipathd commands to finish the
new multipath creation or any operations any more!

We found that uxlsnr thread's cleanup function not close the sockets
also not release the clients when cancel thread, the domain socket
will be release by the system. In any special environment like the
machine's load is very heavy or any situations, the system may not close
the old domain socket when we try to create and bind the new domain
socket may return errno:98(Address already in use).

And also we make some experiments:
in uxsock_cleanup if we close the ux_sock first and then immdediately
call ux_socket_listen to create new ux_sock and initialization will be
OK; if we don't close the ux_sock and call ux_socket_listen will return
-1 and errno = 98.

So we believe that close uxsocket and release clients  when cancel
thread can make sure of that new starting multipathd thread can
create new uxsocket successfully, also can receive multipathd commands
properly. And this path can fix clients' memory leak too.

Signed-off-by: Chongyun Wu <wu.chongyun@h3c.com>
7 weeks agolibmultipath: path latency: remove warnings
Martin Wilck [Sat, 13 Jan 2018 21:19:38 +0000 (22:19 +0100)]
libmultipath: path latency: remove warnings

The warnings at here are pointless. We are looking at a single
path only. Firstly, the standdard deviation for this measurement
can't be "too low" - the lower, the more precise the measurement,
the better. Secondly, a high standard deviation indicates an
unstable path with highly variable latency. Not good, but nothing
to warn about here.

What matters for the selection of "base_num" is not how a single
path behaves, but how different paths of the same path group relate
to each other, which we don't know at this point at the code.

What we want to avoid is too fine a differentiation, in particular
in combination with group_by_prio, because we'd loose the ability for
load balancing. But this is rather a topic for the man page or a
"best practices" document.

7 weeks agolibmultipath: path latency: simplify getprio()
Martin Wilck [Sat, 13 Jan 2018 21:19:37 +0000 (22:19 +0100)]
libmultipath: path latency: simplify getprio()

The log standard deviation can be calculated much more simply
by realizing

   sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2

Also, use timespecsub rather than the custom timeval_to_usec,
and avoid taking log(0).

7 weeks agolibmultipath: path latency: log threshold with p2
Martin Wilck [Sat, 13 Jan 2018 21:19:36 +0000 (22:19 +0100)]
libmultipath: path latency: log threshold with p2

This is not a critical error. It just means that the path in
question will have low priority (rightly so, if it has >100s latency).