Hannes Reinecke [Wed, 16 Jan 2013 12:14:15 +0000 (13:14 +0100)]
Set I_T_nexus_loss_timeout on SAS devices
Some SAS driver have an I_T_nexus_loss setting, which works
similar to the 'dev_loss_tmo' setting on FibreChannel.
So update it with the current values, too.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:14 +0000 (13:14 +0100)]
Set recovery_tmo for iSCSI devices
iSCSI has a 'recovery_tmo' value, which works similar to the
'fast_io_fail' mechanism on FibreChannel.
So we should be setting it from multipath, too.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:13 +0000 (13:14 +0100)]
Discover target ids for ATA
libata devices now have a separate sysfs entry, so we should be
discovering them properly, too.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:12 +0000 (13:14 +0100)]
Use transport identifiers when detecting devices
This patch stores the transport identifiers, allowing
for a simplified dev_loss_tmo setting.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:11 +0000 (13:14 +0100)]
Remove unused structures
structs scsi_dev and scsi_idlun are unused.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:10 +0000 (13:14 +0100)]
Add SUN STK6580 to hardware table
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:09 +0000 (13:14 +0100)]
Add Datacore SANSymphony to hwtable
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:08 +0000 (13:14 +0100)]
Add hardware entry for Intel Multi-Flex
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:07 +0000 (13:14 +0100)]
Correct persistent symlink for cciss
cciss devices have the prefix 'cciss', so we should generate the
correct one from kpartx_id, too.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:06 +0000 (13:14 +0100)]
kpartx.rules: Check for accessible device-mapper device
We need to check for accessible device-mapper devices
right at the start, otherwise kpartx would be run on
inactive devices.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:05 +0000 (13:14 +0100)]
Check for !SUSPENDED in kpartx rules
Read-only devices appear as DM_STATE=READONLY, so we should
invert the check in kpartx rules to have kpartx run on
readonly devices, too.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Wed, 16 Jan 2013 12:14:04 +0000 (13:14 +0100)]
Fixup .gitignore
Only exclude the binaries, not the entire directories ...
And the nfs link, too.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Christophe Varoqui [Wed, 16 Jan 2013 20:40:22 +0000 (21:40 +0100)]
Update multipath.conf.defaults
the defaults.multipath_dir default value is set upon make, so
it may differ on different systems. Best discard it from the
multipath.conf.defaults.
Reported by Xose Vazquez Perez <xose.vazquez@gmail.com>
Christophe Varoqui [Tue, 15 Jan 2013 20:57:27 +0000 (21:57 +0100)]
Refresh the multipath.conf.defaults
The command used to refresh multipath.conf.defaults on a system
with no /etc/multipath.conf installed is:
The explicative header must be added manually afterwards.
Christophe Varoqui [Mon, 14 Jan 2013 09:09:49 +0000 (10:09 +0100)]
Fix some compilation -Wextra warnings
Reported by Xose Vasquez Perez.
The dasd struct change was acked by Stefan Weinhuber.
Christophe Varoqui [Sat, 12 Jan 2013 13:07:09 +0000 (14:07 +0100)]
Add missing log functions from Hannes tree
log_thread_flush()
log_reset()
Ritesh Raj Sarraf [Wed, 9 Jan 2013 10:05:56 +0000 (15:35 +0530)]
Drop useless link to curses library
Description: Do not link against ncurses unnecessarily
Author: Sven Joachim <svenjoac@gmx.de>
Bug-Debian: http://bugs.debian.org/646148
Last-Update: <2011-11-05>
Signed-off-by: Ritesh Raj Sarraf <rrs@debian.org>
Ritesh Raj Sarraf [Wed, 9 Jan 2013 10:05:55 +0000 (15:35 +0530)]
Minor spelling error fixes for Debian's lintian cleanliness
fix missed-out hyphen
Signed-off-by: Ritesh Raj Sarraf <rrs@debian.org>
Guido Günther [Wed, 9 Jan 2013 10:05:54 +0000 (15:35 +0530)]
explicitly include posix_types.h
to get the correct type for __kernel_old_dev_t
Closes: #558990
Signed-off-by: Ritesh Raj Sarraf <rrs@debian.org>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:55 +0000 (00:04 -0600)]
multipath: make path devices readonly again.
Path device fds were changed to be opened read/write when the
mpathpersist code was added. However, I have talked with Vijay, and
this doesn't appear to be necessary for mpathpersist to work correctly.
If the path fds are opened read/write, when the are closed a change
uevent is triggered, which was causing problems during shutdown with
LVM on top of multipath devices. This patch reverts them to being read-only
again.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:53 +0000 (00:04 -0600)]
multipath: Check blacklists as soon as possible
Multipath does a lot of unnecessary work on devices blacklisted by device
type or wwid before ignoring them. When dealing with a large number of
devices blacklisted this way, multipath can take long time to complete.
The patch makes sure that multipath is checking the blacklists as soon
as it has the necessary information to do so. To do this, pathinfo() now
takes another flag DI_BLACKLIST, which is only used by store_pathinfo(),
that tells it to check if the device should be blacklisted. Doing this
cleanly also required changing how store_pathinfo() and rlookup_binding()
are called.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:54 +0000 (00:04 -0600)]
multipath: check for NULL from udev_device_get_*
The udev_device_get_* functions can return NULL, an occassionally do
so in the multipathd code. multipath needs to check if the result
is NULL before dereferencing it.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:52 +0000 (00:04 -0600)]
multipath: Fix kpartx and udevd race
Sometimes when kpartx is used to view partition data on disk image files,
udev still has the loop device open when kpartx is trying to tear it down.
This causes the LOOP_CLR_FD ioctl to fail with EBUSY. kpartx now retries
in this case.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:51 +0000 (00:04 -0600)]
multipath: storagetek 6180 config
New StorageTek default config.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:50 +0000 (00:04 -0600)]
multipath: make reservation_key print out correctly
This patch fixes the reservation_key print functions, so they print
it out like it was in the configuration file. Also, it keeps
cli_getprstatus() from writing over random memory.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:49 +0000 (00:04 -0600)]
multipath: make multipathd work with new dm/lvm
multipathd needs to get setup before lvm (lvm2-activation-early.service).
To do this, it needs to get started in sysinit.target like the other
dm services.
Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:48 +0000 (00:04 -0600)]
multipath: update documentation
Add some missing options to the documentation.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:47 +0000 (00:04 -0600)]
multipath: rely on udev device creation for kpartx
Since kpartx and multipathd don't wait on udev creating the device, there
was a race between libdevmapper and udev to create the device. This meant
that sometimes the /dev/mapper/ devices were devnodes, and sometimes they
were symlinks. Now, for multipathd and kpartx called without -s,
libdevmapper won't create the device nodes, so that udev will always be
responsible for it.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:46 +0000 (00:04 -0600)]
multipath: fix set_oom_adj
Fix set_oom_adj() to work correclty if OOM_ADJUST_MIN isn't
defined.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:45 +0000 (00:04 -0600)]
multipath: update netapp config
The netapp config now used the retain_attached_hwhandler and
detect_prio options to automatically configure correctly for both ALUA and
non-ALUA setups. They also requested that dev_loss_tmo be set to
the maximum for netapp devices.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:44 +0000 (00:04 -0600)]
multipath: add detect_prio option to autodetect
This patch adds a new multipath.conf option, detect_prio. If set to yes,
multipathd will try to determine the correct prioritizer for the device. If
it finds one, that will be used instead of its configured prioritizer. If
none is found, the configured prioritizer will be used. It can currently
only detect ALUA devices.
Also fixed and issue with select_prio where in the devices section, it was
passing in the prio name string instead of the prio args string to prio_get()
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:43 +0000 (00:04 -0600)]
multipath: remove duplicates from multipath
Added code to remove duplcate entries in the devices section, and the
blacklist devices section of the builtin configuration table. The only
change to setup_default_blist is the addition of _blacklist_device()
to check if the device's bl_product entry already exists.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:42 +0000 (00:04 -0600)]
multipath: add support for
This adds support for the retain_attached_hw_handler. When this is set to
"yes", if the kernel has already attached a hardware handler to the multipath
path devices, multipath will use that one, instead of its configured one. If
no hardware handler is already attached, multipath use its configured one, if
any. To properly ignore this on older kernels, I had to makes some changes
to the version checking code.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:41 +0000 (00:04 -0600)]
multipath: make devt2devname use lstat to check
dm_reassign wasn't working correctly for me because devt2devname used stat()
to check if /sys/dev/block/major:minor was a symlink. But stat() never returns
a symlink, if follows it. It needs to use lstat() instead. Also, I made
multipath log a message when a dm device gets reassigned to use multipath.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:40 +0000 (00:04 -0600)]
multipath: change default path_selector to
My testing has showed service-time to be as good and occassionally
noticeably better than round-robin. So the patch switches the
default selector to give better performance out of the box.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:39 +0000 (00:04 -0600)]
multipath: remove default options from built-in
Since changing the path_selector, rr_min_io and rr_min_io_rq options
won't actually break the device configs, there's no reason to set
them equal to the defaults values. They can just inherit them from
the defaults section.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Sat, 12 Jan 2013 06:04:38 +0000 (00:04 -0600)]
multipath: default fast_io_fail to 5 seconds
This patch sets fast_io_fail to 5 seconds in the defaults section.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Christophe Varoqui [Wed, 9 Jan 2013 07:41:51 +0000 (08:41 +0100)]
Revert "Display avg priority as group priority"
This reverts commit
750db1f0dc08bf1d96ce64d32786dfe07b8ae3f9.
Do not compute pathgroup average priority twice on print.
Caught by Mike Christie.
Christophe Varoqui [Wed, 9 Jan 2013 00:09:54 +0000 (01:09 +0100)]
Add missing structs.h include in the iet prioritizer
Patch
fb11301c81c189954b704a9bfff951ba62c3c868 causes structs.h
to be included in checkers. The iet checker was forgotten.
Hannes Reinecke [Tue, 8 Jan 2013 13:54:19 +0000 (14:54 +0100)]
multipathd: lock vectors during initial configuration
During initial configuration the CLI thread is already running,
so we need to lock the vectors here to not race with the
'reconfigure' CLI command.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:18 +0000 (14:54 +0100)]
multipathd: crash in reconfigure CLI command
The 'reconfigure' CLI command doesn't take the vector lock,
so if multipathd is processing a table / udev event at the
same time it'll crash.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:17 +0000 (14:54 +0100)]
multipathd: sighandlers might use uninitialized gvecs
gvecs are initialized after signal handlers, which in turn
might access the vectors.
So the signal handlers might access uninitialized variables.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:16 +0000 (14:54 +0100)]
multipathd deadlocks during restart
During restart multipathd might deadlock as the uevent handler
is missing a cleanup handler. Thus the thread might be terminated
while it still holds the vector lock.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:15 +0000 (14:54 +0100)]
multipathd: Ignore errors when creating pidfile
We can use CLI commands to communicate with the daemon,
so we don't need the pidfile for correct operation.
Hence any errors from creating the pidfile can be safely
ignored.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:14 +0000 (14:54 +0100)]
Clarify dev_loss_tmo capping in multipath.conf.5
The linux kernel will not allow any dev_loss_tmo setting larger
than 300 if fast_io_fail is not set. So we should document this
in the manpage.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Christophe Varoqui [Tue, 8 Jan 2013 23:24:20 +0000 (00:24 +0100)]
Remove a conflict resolution left-over in multipath.conf.5
Hannes Reinecke [Tue, 8 Jan 2013 13:54:13 +0000 (14:54 +0100)]
multipath.conf.5: Clarify dev_loss_tmo settings
We need to document that dev_loss_tmo is in fact modified
by no_path_retry.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:12 +0000 (14:54 +0100)]
multipath.init.suse: Update usage message
The usage message in multipath.init.suse doesn't list 'reload'.
And 'reload' is a misnomer, as it's actually a restart.
So rename 'reload' to 'restart' and add to the usage message.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:11 +0000 (14:54 +0100)]
Syntax error in /etc/init.d/boot.multipath
The multipath init script has a syntax error in line 113.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:10 +0000 (14:54 +0100)]
Make 'allocated' an integer in vector.h
I don't trust the programmers here, as we're unconditionally
decreasing the 'allocated' setting on vector_free().
So better make that an integer to catch underflows.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:09 +0000 (14:54 +0100)]
Use VECTOR_SIZE() defines
The size of a vector slot might be larger than one, so we should
be using the VECTOR_SIZE() define everywhere.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:08 +0000 (14:54 +0100)]
Fix race condition in stop_waiter_thread()
The signal handler might run before we had a chance to
set the 'waiter' context to '0', so better do it previously.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:59 +0000 (14:53 +0100)]
Double free in disassemble_map()
Label 'out1' already frees 'word'; no need to do it here.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:07 +0000 (14:54 +0100)]
libmultipath: Print out uevent sequence number
For debugging we should be printing out the sequence number.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:06 +0000 (14:54 +0100)]
Clean up uevent queue on shutdown
During shutdown there might be some unprocessed events
in the queue. So clear them up.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:05 +0000 (14:54 +0100)]
Update 'no_path_retry' correctly for failed paths
The bug is triggered if path failed event is received by multipathd after all
paths have been already marked as failed. Surprisingly enough, it seems to
happen quite often; colleague of mine who tested this hit this bug every time.
Here is event sequence that explains this bug. I left some messages for
clarity; full log is available on request. We have completed initialization and
set feature queue_if_no_path for map CX_201 by virtue of using no_path_retry >
0.
Aug 31 10:49:09 | CX_201: devmap event #18
Aug 31 10:49:09 | CX_201: discover
Aug 31 10:49:09 | CX_201: rr_weight = 1 (internal default)
Aug 31 10:49:09 | CX_201: pgfailback = -2 (controller setting)
Aug 31 10:49:09 | CX_201: no_path_retry = 2 (controller setting)
Aug 31 10:49:09 | pg_timeout = NONE (internal default)
Aug 31 10:49:09 | 65:192: mark as failed
Aug 31 10:49:09 | CX_201: remaining active paths: 3
Aug 31 10:49:09 | 8:192: mark as failed
Aug 31 10:49:09 | CX_201: remaining active paths: 2
Aug 31 10:49:09 | CX_201: devmap event #19
Aug 31 10:49:09 | CX_201: discover
Aug 31 10:49:09 | CX_201: rr_weight = 1 (internal default)
Aug 31 10:49:09 | CX_201: pgfailback = -2 (controller setting)
Aug 31 10:49:09 | CX_201: no_path_retry = 2 (controller setting)
Aug 31 10:49:09 | pg_timeout = NONE (internal default)
Two paths failed by driver, multipahd marked them as failed.
Aug 31 10:49:09 | checker failed path 66:0 in map CX_201
Aug 31 10:49:09 | CX_201: remaining active paths: 1
Checker failed third path
Aug 31 10:49:09 | checker failed path 8:96 in map CX_201
Aug 31 10:49:09 | CX_201: Entering recovery mode: max_retries=2
Aug 31 10:49:09 | CX_201: remaining active paths: 0
Checker failed last path; multipathd entered retry loop.
Aug 31 10:49:10 | CX_201: devmap event #20
We got late event about failed path
Aug 31 10:49:10 | CX_201: discover
Start discovery. Call update_multipath -> setup_multipath ->
update_multipath_strings -> update_multipath_tablle -> disassemble_map.
Now disassemble_map tries to set no_path_retry value from kernel. This
obviously is not going to work as kernel is able remembering only Boolean
(queue/fail), while no_path_retry is arbitrary integer. So no_path_retry is set
to NO_PATH_RETRY_QUEUE from kernel.
Aug 31 10:49:10 | CX_201: rr_weight = 1 (internal default)
Aug 31 10:49:10 | CX_201: pgfailback = -2 (controller setting)
At this point we call set_no_path_retry:
set_no_path_retry(struct multipath *mpp)
{
mpp->retry_tick = 0;
mpp->nr_active = pathcount(mpp, PATH_UP) + pathcount(mpp, PATH_GHOST);
if (mpp->nr_active > 0)
select_no_path_retry(mpp);
So
1) retry_tick is reset
2) nr_active = 0 (no active path)
3) we do not set no_path_retry from config file because nr_active == 0 => left
with NO_PATH_RETRY_QUEUE.
Aug 31 10:49:10 | pg_timeout = NONE (internal default)
>From now on there is no state changes, so map is hung forever.
Signed-off-by: Martin Wilck <martin.wilck@ts.fujitsu.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:04 +0000 (14:54 +0100)]
Print log messages when updating tables failed
Add some logging messages to identify the case of failure.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:03 +0000 (14:54 +0100)]
Make log_pthread more robust
We don't need to allocate memory for mutexes, we can just
be using static variables. And valgrind complained about
logqueue flush from shutdown, so don't do this.
The normal shutdown process should be flushing the log
queue anyway.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:02 +0000 (14:54 +0100)]
Do not call sysfs_get_timeout for non-SCSI devices
Only SCSI devices have a timeout, so there is no point in
trying to set a timeout for other types.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:01 +0000 (14:54 +0100)]
Path checker should return PATH_DOWN when no path is found
If the path checker fails to lookup the path in sysfs it's
already gone, so we should rather return 'PATH_DOWN' here.
Otherwise the path will never marked failed and no failover
will happen.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:54:00 +0000 (14:54 +0100)]
libmultipath: prio keyword ignored for multipath config
When specifying the 'prio' keyword in the multipath section
of the configuration file the value is ignored.
Problem is that the 'wwid' value is set only after the call
to select_prio(), so the correct definition couldn't been
found in the config file.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:58 +0000 (14:53 +0100)]
Switch off 'queue_if_no_path' before removing maps
Before we try to flush a map we have to switch off the
'queue_if_no_path' setting to flush any outstanding I/O.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:57 +0000 (14:53 +0100)]
Inconsistent string quoting
When printing the hardware table strings are quoted twice, and
even numerical values have single quotes. This patch removes
the double quotes and retains single quotes only for string
values.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:56 +0000 (14:53 +0100)]
Check return code from pathinfo()
Pathinfo might fail, which indicates that the path is not
available anymore. So check the return value and take
appropriate action.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:55 +0000 (14:53 +0100)]
Increase parameter buffer
Multipath is using an internal static buffer for assembling
device-mapper tables, which might be too small for large setups.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:54 +0000 (14:53 +0100)]
libmultipath: error checking in remove_features()
An error check was missing in remove_features().
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:53 +0000 (14:53 +0100)]
Clarify setting origin in propsel.c
We should differentiate between config file and internal default.
And the controller settings are not defaults.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:52 +0000 (14:53 +0100)]
Print out multipath alias for flush_on_last_del messages
Added for consistency.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:51 +0000 (14:53 +0100)]
Incorrect inquiry vendor length in hds prioritizer
The inquiry vendor length is 8 bytes, but snprintf writes
the given number of bytes _including_ the NULL byte. So
we need to supply a 9 byte buffer here.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:50 +0000 (14:53 +0100)]
Valgrind fixes for prioritizer
Declaring an array does not zero out its contents. So we might
be reading random garbage here.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:49 +0000 (14:53 +0100)]
Checker name is not displayed on failure
If add_checker() isn't able to locate the checker
it won't display the name in free_checker().
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:48 +0000 (14:53 +0100)]
Introduce MP_FAST_IO_FAIL_UNSET
For completeness; all other special values are encoded with
defines, so 'unset' should be, too.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:47 +0000 (14:53 +0100)]
Do not trigger a map reload on priority updates
update_path_groups() is just there to update the priority groups,
so it should trigger a table reload only if the priority has
indeed changed.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:46 +0000 (14:53 +0100)]
libmultipath: Fix typo in mp_prio_handler()
The mpentry is found in conf->mptable, not conf->hwtable.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:45 +0000 (14:53 +0100)]
Add TAGS makefile target
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:44 +0000 (14:53 +0100)]
Accept several whitespaces in bindings file
Prior versions of multipathd would accept several whitespaces
in the bindings file.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Petr Uzel [Tue, 8 Jan 2013 13:53:43 +0000 (14:53 +0100)]
prio: fix merging of prioritizers with different args
Signed-off-by: Petr Uzel <petr.uzel@suse.cz>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:42 +0000 (14:53 +0100)]
libmultipath: resource leak in read_value_block()
read_value_block() allocates the vector 'elements', but
doesn't free it on error.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:41 +0000 (14:53 +0100)]
Fixup pathgroup allocation in disassemble_map()
The check for empty path groups in disassemble_map() is not quite
correct; we might end up removing the pathgroup vector even though
there are some entries in it.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:40 +0000 (14:53 +0100)]
Remove newline from condlog()
condlog() already adds a newline to each message.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Hannes Reinecke [Tue, 8 Jan 2013 13:53:39 +0000 (14:53 +0100)]
libmultipath: Invalid check for mpp->wwid in dm_addmap()
mpp->wwid is an array, and so a check against NULL
is wrong; we need to use strlen here.
Found by coverity.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Mike Christie [Tue, 8 Jan 2013 19:36:34 +0000 (13:36 -0600)]
libmultipath: fix segfault when vector is null
While performing tests that caused paths to get added
and deleted, we hit a segfault. We traced it to the
vector struct being NULL. This patch fixes the problem
by checking for a NULL vector before accessing it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Phillip Susi [Sun, 6 Jan 2013 02:57:30 +0000 (21:57 -0500)]
fix extended partition mapping
The linux kernel maps the extended partition only
so that LILO can be installed there. The length is always set
to two sectors to allow this, and most tools know to ignore the
device. kpartx was mapping the entire extended partition, then
stacking the logical partitions on top of it. This presented
a device that looked like an entirely separate disk that
contains only the logical partitions. This patch fixes kpartx
to conform with the normal Linux behavior.
Wang Sheng-Hui [Sat, 5 Jan 2013 19:36:07 +0000 (20:36 +0100)]
libmpathpersist: correct function description in mpath_persist.h and man file
* For mpath_persistent_reserve_out, we have "#define MPATH_PRTPE_EA_RO
0x06". Correct the description of the function prototype
mpath_persistent_reserve_out in mpath_persist.h and man file.
* Correct typo in the description for the function prototype
mpath_persistent_reserve_in in mpath_persist.h
Guangyu Sun [Sat, 22 Dec 2012 08:40:59 +0000 (09:40 +0100)]
libmultipath: fix flush_on_last_del config handlers
Now flush_on_last_del cannot be set to FLUSH_DISABLED if its default
value is FLUSH_ENABLED. This patch fixes the issue.
Peter Gervai [Sun, 30 Sep 2012 20:48:37 +0000 (22:48 +0200)]
iet prioritizer fix
- revert the path weight allocation : heavy weight for the
preferred path, light for others
- mention the original author's name
- embed a short usage documentation
Benjamin Marzinski [Mon, 20 Aug 2012 22:26:46 +0000 (17:26 -0500)]
multipath: fix setting sysfs fc timeout parameters
Multipath was accidentally trying to write to the directory where
dev_loss_tmo and fast_io_fail_tmo were located instead to the files
themselves. Also, if dev_loss_tmo was unset, it was trying to set it
to 0. Finally, it wasn't correctly checking for errors in setting
the files. This patch fixes these issues.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Mon, 20 Aug 2012 22:19:26 +0000 (17:19 -0500)]
multipath: and wwids_file multipath.conf option
This patch adds a wwids_file multipath.conf option, so that users can
move the wwids file from its default location at /etc/multipath/wwids.
It also corrects the default bindings file location in the multipath.conf
manpage and makes the bindings_file value always print out when you
display the configuration, like is done with the other default values.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Fri, 27 Jul 2012 20:55:24 +0000 (15:55 -0500)]
multipath: check if a device belongs to multipath
This patch adds a new multipath option "-c", which checks if a device
belongs to multipath. This can be done during the add uevent for the
device, before the multipath device has even been created. This allows
udev to be able to handle multipath path devices differently. To do
this multipath now keeps track of the wwids of all previously created
multipath devices in /etc/multipath/wwids. The file creating and
editting code from alias.[ch] has been split out into file.[ch]. When a
device is checked to see if it's a multipath path, it's wwid is
compared against the ones in the wwids file. Also, since the
uid_attribute may not have been added to the udev database entry for
the device if this is called in a udev rule, when using the "-c" option,
get_uid will now also check the process' envirionment variables for the
uid_attribute.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Fri, 27 Jul 2012 20:56:19 +0000 (15:56 -0500)]
multipath: add followover failback mode
This patch adds a new failback mode, followover, to deal with multiple
computers accessing the same active/passive storage devices. In these
cases, if only one node loses access to the primary paths, it will
force a trespass to the secondary paths. If the nodes are configured
with immediate failback, the other nodes with trespass back to the
primary paths, and the machines will ping-pong the storage. If the
nodes are configured with manual failback, this won't happen. However
when the primary path is restored on the node that lost access to it,
the nodes won't automatically failback to it. In followover mode, they
will.
Followover mode works by only failing back when a path comes back online
from a pathgroup that previously had no working paths. For this to
work, the paths need an additional attribute, chkrstate. This is just like
the path state, except it is not updated when the paths state is changed
by the kernel, only when the path checker function sees that the path is
down. This is necessary because when a trespass occurs, all the outstanding
IO to the previously active paths will fail, and the kernel will mark the
path as down. But for failback to happen in followover mode, the paths must
actually be down, not just in a ghost state.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Fri, 27 Jul 2012 20:57:18 +0000 (15:57 -0500)]
multipath: fix cciss device names
When we're looking for cciss devices in sysfs, they have a "!" not a "/".
If users run multipath on a cciss device using it's devnode name,
/dev/cciss/cXdY, multipath should convert that to the sysfs name.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Fri, 27 Jul 2012 20:54:29 +0000 (15:54 -0500)]
multipath: remove callout code
Since nothing is using the callout code anymore, I've removed it.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Jun'ichi Nomura [Fri, 17 Aug 2012 08:40:39 +0000 (17:40 +0900)]
multipath-tools: prevent unexpected swapping of underlying LUNs
When you want to rename a multipath device
but the new alias is already used by other multipath device,
multipath-tools mistakenly reload a table for the original multipath device
to the other multipath device.
That could lead to very bad result, such as I/O error and data corruption.
This patch checks such a condition and gives up renaming with error log.
For example, suppose you have following 'bindings' file:
# cat /etc/multipath/bindings
mpatha
212140084abcd0000
mpathb
212150084abcd0000
and a logical volume 'VG/LV0' on top of mpathb,
which is on top of /dev/sde(8:64) and /dev/sdk(8:160):
# dmsetup ls --tree
mpatha (253:1)
├─ (8:144)
└─ (8:48)
VG-LV0 (253:2)
└─mpathb (253:0)
├─ (8:160)
└─ (8:64)
Then you decide to swap their names and change the 'bindings' as follows:
# cat /etc/multipath/bindings
mpathb
212140084abcd0000
mpatha
212150084abcd0000
you'll get this after 'service multipathd reload':
# dmsetup ls --tree
mpatha (253:1)
├─ (8:160)
└─ (8:64)
VG-LV0 (253:2)
└─mpathb (253:0)
├─ (8:144)
└─ (8:48)
Now you suddenly have 'VG/LV0' on top of /dev/sdd(8:48) and /dev/sdj(8:144),
that is obviously wrong and will corrupt data if you write to 'VG/LV0'.
Moger, Babu [Tue, 10 Jul 2012 19:12:13 +0000 (19:12 +0000)]
multipath: retry the DID_SOFT_ERROR for rdac checker commands
Sometimes we have seen immediate path failures for DID_SOFT_ERROR status. Just add a retry as other statuses.
It will basically add 5 retries.
Here are the messages.
Jun 4 17:46:42 ictc-billy kernel: mpt2sas0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
Jun 4 17:46:42 ictc-billy kernel: sd 1:0:0:47: [sdn] Unhandled error code
Jun 4 17:46:42 ictc-billy kernel: sd 1:0:0:47: [sdn] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Jun 4 17:46:42 ictc-billy kernel: sd 1:0:0:47: [sdn] CDB: Write(10): 2a 00 00 06 e4 a0 00 00 20 00
Jun 4 17:46:42 ictc-billy kernel: device-mapper: multipath: Failing path 8:208.
Jun 4 17:46:42 ictc-billy multipathd: 8:208: mark as failed
Jun 4 17:46:42 ictc-billy multipathd: mpathat: remaining active paths: 1
Signed-off-by: Babu Moger <babu.moger@netapp.com>
Benjamin Marzinski [Mon, 11 Jun 2012 21:32:35 +0000 (16:32 -0500)]
multipath: fix libudev bug in sysfs_get_tgt_nodename
In a recent patch, I introduced a bug into sysfs_get_tgt_nodename().
multipath must not unreference the target udevice before it copies the
tgt_nodename to another location, otherwise the value pointer will be
pointing at freed memory.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Fri, 8 Jun 2012 17:28:28 +0000 (12:28 -0500)]
multipath: remove unnecessary fds from alias code
Originally, the alias code duped the bindings file fd so that a stream could
be opened on it, and then closed without closing the original fd. Later,
closing the stream was moved to the end of the function, to avoid a locking
bug. Because of this, there isn't any point to duping the fd. Also, since
the stream is still opened when the original fd is used, the stream
should be flushed after its done being used.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Tue, 5 Jun 2012 23:04:36 +0000 (18:04 -0500)]
multipath: libudev cleanup and bugfixes
get_refwwid wasn't working anymore, since it wasn't setting the path's udevice.
Also, cli_add_path was dereferencing a NULL pointer (pp). Finally, there were
a number of places where udev devices weren't getting dereferenced when they
should have been, causing memory leaks. This patch cleans these up.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Fri, 25 May 2012 04:57:43 +0000 (23:57 -0500)]
multipath: Fix warnings from stricter compile options.
With stricter compilation options, multipath printed number of
warnings during compilation. Some of them were actual bugs. Others
couldn't cause any problems. This patch cleans up all the new
warnings.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Fri, 25 May 2012 04:57:42 +0000 (23:57 -0500)]
multipath: Build with standard rpm cflags
This patch makes multipath build with the standard redhat rpm cflags, which
can help catch some code errors.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Benjamin Marzinski [Wed, 23 May 2012 21:42:20 +0000 (16:42 -0500)]
multipath: Some device configuration changes for NetApp LUNs
NetApp has asked for these configuration changes.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>