Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=78…
Commit: 782091c04c4cd718d2279ec0ab31ced262adc7b2
Parent: 8788606adfa1f21f7fa036986ee6e49cb2f2ee78
Author: David Teigland <teigland(a)redhat.com>
AuthorDate: Tue Jul 27 13:50:14 2010 -0500
Committer: David Teigland <teigland(a)redhat.com>
CommitterDate: Tue Jul 27 16:39:19 2010 -0500
dlm_controld: fix plock checkpoint signatures
Commit e2ccbf90543cf1d163d1a067bf5a8ce049a9c134 for bz 578625
was not correct to use "p_count" (a count of plocks) in the
signature calculation. When plock_ownership is on, the plocks
under an owned resource are not copied into the checkpoint.
However, the node writing the checkpoint counts all these
owned plocks and factors the count into the signature. The
node reading the checkpoint does not get the plocks, so its
count of plocks is different, causing the signature calculation
to be different. It will then disable plock operations.
It would be very common for this to occur in practice, so the
impact is very high.
bz 618806
Signed-off-by: David Teigland <teigland(a)redhat.com>
---
group/dlm_controld/plock.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/group/dlm_controld/plock.c b/group/dlm_controld/plock.c
index bf6ddfa..861a39b 100644
--- a/group/dlm_controld/plock.c
+++ b/group/dlm_controld/plock.c
@@ -1985,8 +1985,7 @@ void store_plocks(struct lockspace *ls, uint32_t *sig)
}
}
out:
- *sig = (0xFFFFFFFF & r_num_first) ^ (0xFFFFFFFF & r_num_last) ^
- r_count ^ p_count;
+ *sig = (0xFFFFFFFF & r_num_first) ^ (0xFFFFFFFF & r_num_last) ^ r_count;
log_group(ls, "store_plocks first %llu last %llu r_count %u "
"p_count %u sig %x",
@@ -2134,8 +2133,7 @@ void retrieve_plocks(struct lockspace *ls, uint32_t *sig)
out:
saCkptCheckpointClose(h);
- *sig = (0xFFFFFFFF & r_num_first) ^ (0xFFFFFFFF & r_num_last)
- ^ r_count ^ p_count;
+ *sig = (0xFFFFFFFF & r_num_first) ^ (0xFFFFFFFF & r_num_last) ^ r_count;
log_group(ls, "retrieve_plocks first %llu last %llu r_count %u "
"p_count %u sig %x",
Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=86…
Commit: 86ebdb800354b29dc30ec9b72379c7c7e5a8db73
Parent: 31d140c5924e02e46670365cc1ea9977775bacdd
Author: David Teigland <teigland(a)redhat.com>
AuthorDate: Tue Jul 27 14:06:53 2010 -0500
Committer: David Teigland <teigland(a)redhat.com>
CommitterDate: Tue Jul 27 14:16:35 2010 -0500
dlm_controld: fix plock owner in checkpoints
The wrong plock resource owner is written into checkpoints
when plock_ownership is 0. This causes a node that mounts
the fs to have incorrect owner values, which cause the
plock operations to permanently hang.
This bug seems to have existed since the plock code was originally
copied into dlm_controld from gfs_controld. As part of the copy,
there were some small code changes. One was to always include the
resource owner in the checkpoint data, instead of only including it
when plock_ownership was 1. The owner was then written and read
incorrectly when plock_ownership was 0.
bz 618814
Signed-off-by: David Teigland <teigland(a)redhat.com>
---
group/dlm_controld/plock.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/group/dlm_controld/plock.c b/group/dlm_controld/plock.c
index 861a39b..d18d1f5 100644
--- a/group/dlm_controld/plock.c
+++ b/group/dlm_controld/plock.c
@@ -1924,7 +1924,9 @@ void store_plocks(struct lockspace *ls, uint32_t *sig)
(there should be no SYNCING plocks) */
list_for_each_entry(r, &ls->plock_resources, list) {
- if (r->owner == -1)
+ if (!cfgd_plock_ownership)
+ owner = 0;
+ else if (r->owner == -1)
continue;
else if (r->owner == our_nodeid)
owner = our_nodeid;
Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=31…
Commit: 31d140c5924e02e46670365cc1ea9977775bacdd
Parent: 331df4574758f5b98f376bfa1d3df037c85c97e6
Author: David Teigland <teigland(a)redhat.com>
AuthorDate: Tue Jul 27 13:50:14 2010 -0500
Committer: David Teigland <teigland(a)redhat.com>
CommitterDate: Tue Jul 27 13:50:14 2010 -0500
dlm_controld: fix plock checkpoint signatures
Commit e2ccbf90543cf1d163d1a067bf5a8ce049a9c134 for bz 578625
was not correct to use "p_count" (a count of plocks) in the
signature calculation. When plock_ownership is on, the plocks
under an owned resource are not copied into the checkpoint.
However, the node writing the checkpoint counts all these
owned plocks and factors the count into the signature. The
node reading the checkpoint does not get the plocks, so its
count of plocks is different, causing the signature calculation
to be different. It will then disable plock operations.
It would be very common for this to occur in practice, so the
impact is very high.
bz 618806
Signed-off-by: David Teigland <teigland(a)redhat.com>
---
group/dlm_controld/plock.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/group/dlm_controld/plock.c b/group/dlm_controld/plock.c
index bf6ddfa..861a39b 100644
--- a/group/dlm_controld/plock.c
+++ b/group/dlm_controld/plock.c
@@ -1985,8 +1985,7 @@ void store_plocks(struct lockspace *ls, uint32_t *sig)
}
}
out:
- *sig = (0xFFFFFFFF & r_num_first) ^ (0xFFFFFFFF & r_num_last) ^
- r_count ^ p_count;
+ *sig = (0xFFFFFFFF & r_num_first) ^ (0xFFFFFFFF & r_num_last) ^ r_count;
log_group(ls, "store_plocks first %llu last %llu r_count %u "
"p_count %u sig %x",
@@ -2134,8 +2133,7 @@ void retrieve_plocks(struct lockspace *ls, uint32_t *sig)
out:
saCkptCheckpointClose(h);
- *sig = (0xFFFFFFFF & r_num_first) ^ (0xFFFFFFFF & r_num_last)
- ^ r_count ^ p_count;
+ *sig = (0xFFFFFFFF & r_num_first) ^ (0xFFFFFFFF & r_num_last) ^ r_count;
log_group(ls, "retrieve_plocks first %llu last %llu r_count %u "
"p_count %u sig %x",