Re: help needed

List overview All Threads
Download

newer

older

OpenStack status

Cloud SIG Meeting is Today!...

Garrett Holmstrom

29 Mar 2012 29 Mar '12

1:48 p.m.

On Mar 28, 2012 9:06 PM, "Heherson Pagcaliwagan" herson@azneita.org wrote:

...

On Thu, Mar 29, 2012 at 12:22 AM, Dennis Gilmore dennis@ausil.us wrote:

...
i have uploaded a x86_64 image to us-east-1 in ec2 the ami is ami-3fb16f56 for whatever reason that i can not yet figure out the image is booting fine but ssh will not allow me to connect. ive stopped the image and attached it to a f16 instance and examined the disk and it all looks fine. with the ssh logs just saying that the client disconnected.

Id appreciate if some people could have a look and see if its working for them or help diagnose what exactly is going on.

Not sure if this is it, but I did not see a /root/.ssh/authorized_keys

file.

This is expected. Look for it under /home/ec2-user instead.

Of course, if it isn't there either then there may be a problem. ;-)

Attachments:

attachment.html (text/html — 1.1 KB)

Show replies by date

Heherson Pagcaliwagan

30 Mar 30 Mar

12:39 a.m.

New subject: help needed

On Thu, Mar 29, 2012 at 8:48 PM, Garrett Holmstrom gholms@fedoraproject.org wrote:

...

On Mar 28, 2012 9:06 PM, "Heherson Pagcaliwagan" herson@azneita.org wrote:

...
On Thu, Mar 29, 2012 at 12:22 AM, Dennis Gilmore dennis@ausil.us wrote:

...
i have uploaded a x86_64 image to us-east-1 in ec2 the ami is ami-3fb16f56 for whatever reason that i can not yet figure out the image is booting fine but ssh will not allow me to connect. ive stopped the image and attached it to a f16 instance and examined the disk and it all looks fine. with the ssh logs just saying that the client disconnected.

Id appreciate if some people could have a look and see if its working for them or help diagnose what exactly is going on.

Not sure if this is it, but I did not see a /root/.ssh/authorized_keys file.

This is expected. Look for it under /home/ec2-user instead.

Thanks Garret. It did not occur to me to check the entire contents of /home/ and solely relied on the web console's "Connect to Instance" hint. Will try again later :)

...

Of course, if it isn't there either then there may be a problem. ;-)

cloud mailing list cloud@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/cloud

Heherson Pagcaliwagan

12:53 a.m.

New subject: help needed

On Fri, Mar 30, 2012 at 7:39 AM, Heherson Pagcaliwagan herson@azneita.org wrote:

...

On Thu, Mar 29, 2012 at 8:48 PM, Garrett Holmstrom gholms@fedoraproject.org wrote:

...
On Mar 28, 2012 9:06 PM, "Heherson Pagcaliwagan" herson@azneita.org wrote:

...
On Thu, Mar 29, 2012 at 12:22 AM, Dennis Gilmore dennis@ausil.us wrote:

...
i have uploaded a x86_64 image to us-east-1 in ec2 the ami is ami-3fb16f56 for whatever reason that i can not yet figure out the image is booting fine but ssh will not allow me to connect. ive stopped the image and attached it to a f16 instance and examined the disk and it all looks fine. with the ssh logs just saying that the client disconnected.

Id appreciate if some people could have a look and see if its working for them or help diagnose what exactly is going on.

Not sure if this is it, but I did not see a /root/.ssh/authorized_keys file.

This is expected. Look for it under /home/ec2-user instead.

Thanks Garret. It did not occur to me to check the entire contents of /home/ and solely relied on the web console's "Connect to Instance" hint. Will try again later :)

...
Of course, if it isn't there either then there may be a problem. ;-)

Now this is fun. Yup, no authorized_keys file on under /home/ec2-user.

...

...

cloud mailing list cloud@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/cloud

Andy Grimm

9:54 p.m.

New subject: help needed

On Thu, Mar 29, 2012 at 7:53 PM, Heherson Pagcaliwagan herson@azneita.org wrote:

...

On Fri, Mar 30, 2012 at 7:39 AM, Heherson Pagcaliwagan herson@azneita.org wrote:

...
On Thu, Mar 29, 2012 at 8:48 PM, Garrett Holmstrom gholms@fedoraproject.org wrote:

...
On Mar 28, 2012 9:06 PM, "Heherson Pagcaliwagan" herson@azneita.org wrote:

...
On Thu, Mar 29, 2012 at 12:22 AM, Dennis Gilmore dennis@ausil.us wrote:

...
i have uploaded a x86_64 image to us-east-1 in ec2 the ami is ami-3fb16f56 for whatever reason that i can not yet figure out the image is booting fine but ssh will not allow me to connect. ive stopped the image and attached it to a f16 instance and examined the disk and it all looks fine. with the ssh logs just saying that the client disconnected.

Id appreciate if some people could have a look and see if its working for them or help diagnose what exactly is going on.

Not sure if this is it, but I did not see a /root/.ssh/authorized_keys file.

This is expected. Look for it under /home/ec2-user instead.

Thanks Garret. It did not occur to me to check the entire contents of /home/ and solely relied on the web console's "Connect to Instance" hint. Will try again later :)

...
Of course, if it isn't there either then there may be a problem. ;-)

Now this is fun. Yup, no authorized_keys file on under /home/ec2-user.

Right, I see the same thing. authorized_keys is not being populated. Here's my guess. On working F16, I see:

Mar 30 15:43:24 localhost cloud-init[748]: ci-info: lo : 1 127.0.0.1 255.0.0.0 Mar 30 15:43:24 localhost cloud-init[748]: ci-info: eth0 : 1 10.245.187.79 255.255.254.0 12:31:3d:01:b8:a1 Mar 30 15:43:24 localhost cloud-init[748]: ci-info: route-0: 0.0.0.0 10.245.186.1 0.0.0.0 eth0 UG Mar 30 15:43:24 localhost cloud-init[748]: ci-info: route-1: 10.245.186.0 0.0.0.0 255.255.254.0 eth0 U Mar 30 15:43:24 localhost cloud-init[748]: ci-info: route-2: 169.254.0.0 0.0.0.0 255.255.0.0 eth0 U

On F17, I see:

Mar 30 15:27:12 localhost cloud-init[543]: ci-info: route-0: 0.0.0.0 10.80.210.1 0.0.0.0 eth0 UG Mar 30 15:27:12 localhost cloud-init[543]: ci-info: route-1: 10.80.210.0 0.0.0.0 255.255.254.0 eth0 U

Of those differences, I suspect that lack of a zeroconf route (169.254.0.0/16) is probably preventing access to the metadata service. Further, I believe the reason is related to the addition of NetworkManager in the F17 AMI (because the zeroconf route is typically added via the ifcfg-eth script, which NM does not run). Before I go hacking further, is there a particular reason that we switched to using NetworkManager in the F17 AMI? Would removing it be the wrong solution, and if so, is there a quick way to ensure that NM initializes a zeroconf route?

--Andy

Robyn Bergeron

10 p.m.

New subject: help needed

On Fri, Mar 30, 2012 at 1:54 PM, Andy Grimm agrimm@gmail.com wrote:

...

On Thu, Mar 29, 2012 at 7:53 PM, Heherson Pagcaliwagan herson@azneita.org wrote:

...
On Fri, Mar 30, 2012 at 7:39 AM, Heherson Pagcaliwagan herson@azneita.org wrote:

...
On Thu, Mar 29, 2012 at 8:48 PM, Garrett Holmstrom gholms@fedoraproject.org wrote:

...
On Mar 28, 2012 9:06 PM, "Heherson Pagcaliwagan" herson@azneita.org wrote:

...
On Thu, Mar 29, 2012 at 12:22 AM, Dennis Gilmore dennis@ausil.us wrote:

...
i have uploaded a x86_64 image to us-east-1 in ec2 the ami is ami-3fb16f56 for whatever reason that i can not yet figure out the image is booting fine but ssh will not allow me to connect. ive stopped the image and attached it to a f16 instance and examined the disk and it all looks fine. with the ssh logs just saying that the client disconnected.

Id appreciate if some people could have a look and see if its working for them or help diagnose what exactly is going on.

Not sure if this is it, but I did not see a /root/.ssh/authorized_keys file.

This is expected. Look for it under /home/ec2-user instead.

Thanks Garret. It did not occur to me to check the entire contents of /home/ and solely relied on the web console's "Connect to Instance" hint. Will try again later :)

...
Of course, if it isn't there either then there may be a problem. ;-)

Now this is fun. Yup, no authorized_keys file on under /home/ec2-user.

Right, I see the same thing. authorized_keys is not being populated. Here's my guess. On working F16, I see:

Mar 30 15:43:24 localhost cloud-init[748]: ci-info: lo : 1 127.0.0.1 255.0.0.0 Mar 30 15:43:24 localhost cloud-init[748]: ci-info: eth0 : 1 10.245.187.79 255.255.254.0 12:31:3d:01:b8:a1 Mar 30 15:43:24 localhost cloud-init[748]: ci-info: route-0: 0.0.0.0 10.245.186.1 0.0.0.0 eth0 UG Mar 30 15:43:24 localhost cloud-init[748]: ci-info: route-1: 10.245.186.0 0.0.0.0 255.255.254.0 eth0 U Mar 30 15:43:24 localhost cloud-init[748]: ci-info: route-2: 169.254.0.0 0.0.0.0 255.255.0.0 eth0 U

On F17, I see:

Mar 30 15:27:12 localhost cloud-init[543]: ci-info: route-0: 0.0.0.0 10.80.210.1 0.0.0.0 eth0 UG Mar 30 15:27:12 localhost cloud-init[543]: ci-info: route-1: 10.80.210.0 0.0.0.0 255.255.254.0 eth0 U

Of those differences, I suspect that lack of a zeroconf route (169.254.0.0/16) is probably preventing access to the metadata service. Further, I believe the reason is related to the addition of NetworkManager in the F17 AMI (because the zeroconf route is typically added via the ifcfg-eth script, which NM does not run). Before I go hacking further, is there a particular reason that we switched to using NetworkManager in the F17 AMI? Would removing it be the wrong solution, and if so, is there a quick way to ensure that NM initializes a zeroconf route?

Hmmm, I wonder if it's loosely related to this:

https://bugzilla.redhat.com/show_bug.cgi?id=802475

(See details in comments, it has something to do with something new in comps, libvirt, networkmanager, other fun stuff.)

...

--Andy _______________________________________________ cloud mailing list cloud@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/cloud

Robyn Bergeron

10:09 p.m.

New subject: help needed

On Fri, Mar 30, 2012 at 2:00 PM, Robyn Bergeron robyn.bergeron@gmail.com wrote:

...

On Fri, Mar 30, 2012 at 1:54 PM, Andy Grimm agrimm@gmail.com wrote:

...
On Thu, Mar 29, 2012 at 7:53 PM, Heherson Pagcaliwagan herson@azneita.org wrote:

...
On Fri, Mar 30, 2012 at 7:39 AM, Heherson Pagcaliwagan herson@azneita.org wrote:

...
On Thu, Mar 29, 2012 at 8:48 PM, Garrett Holmstrom gholms@fedoraproject.org wrote:

...
On Mar 28, 2012 9:06 PM, "Heherson Pagcaliwagan" herson@azneita.org wrote:

...
On Thu, Mar 29, 2012 at 12:22 AM, Dennis Gilmore dennis@ausil.us wrote: > i have uploaded a x86_64 image to us-east-1 in ec2 the ami is > ami-3fb16f56 for whatever reason that i can not yet figure out the > image is booting fine but ssh will not allow me to connect. ive stopped > the image and attached it to a f16 instance and examined the disk and > it all looks fine. with the ssh logs just saying that the client > disconnected. > > Id appreciate if some people could have a look and see if its working > for them or help diagnose what exactly is going on.

Not sure if this is it, but I did not see a /root/.ssh/authorized_keys file.

This is expected. Look for it under /home/ec2-user instead.

Thanks Garret. It did not occur to me to check the entire contents of /home/ and solely relied on the web console's "Connect to Instance" hint. Will try again later :)

...
Of course, if it isn't there either then there may be a problem. ;-)

Now this is fun. Yup, no authorized_keys file on under /home/ec2-user.

Right, I see the same thing. authorized_keys is not being populated. Here's my guess. On working F16, I see:

Mar 30 15:43:24 localhost cloud-init[748]: ci-info: lo : 1 127.0.0.1 255.0.0.0 Mar 30 15:43:24 localhost cloud-init[748]: ci-info: eth0 : 1 10.245.187.79 255.255.254.0 12:31:3d:01:b8:a1 Mar 30 15:43:24 localhost cloud-init[748]: ci-info: route-0: 0.0.0.0 10.245.186.1 0.0.0.0 eth0 UG Mar 30 15:43:24 localhost cloud-init[748]: ci-info: route-1: 10.245.186.0 0.0.0.0 255.255.254.0 eth0 U Mar 30 15:43:24 localhost cloud-init[748]: ci-info: route-2: 169.254.0.0 0.0.0.0 255.255.0.0 eth0 U

On F17, I see:

Mar 30 15:27:12 localhost cloud-init[543]: ci-info: route-0: 0.0.0.0 10.80.210.1 0.0.0.0 eth0 UG Mar 30 15:27:12 localhost cloud-init[543]: ci-info: route-1: 10.80.210.0 0.0.0.0 255.255.254.0 eth0 U

Of those differences, I suspect that lack of a zeroconf route (169.254.0.0/16) is probably preventing access to the metadata service. Further, I believe the reason is related to the addition of NetworkManager in the F17 AMI (because the zeroconf route is typically added via the ifcfg-eth script, which NM does not run). Before I go hacking further, is there a particular reason that we switched to using NetworkManager in the F17 AMI? Would removing it be the wrong solution, and if so, is there a quick way to ensure that NM initializes a zeroconf route?

Hmmm, I wonder if it's loosely related to this:

https://bugzilla.redhat.com/show_bug.cgi?id=802475

(See details in comments, it has something to do with something new in comps, libvirt, networkmanager, other fun stuff.)

Actually, don't mind me, the stuff that changed changed very recently (between rc1 and rc2) so I think this wouldn't necessarily be the root of the problem, since we've been looking at this for a while.

...

...
--Andy _______________________________________________ cloud mailing list cloud@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/cloud

Tim Flink

10:35 p.m.

New subject: help needed

On Fri, 30 Mar 2012 16:54:47 -0400 Andy Grimm agrimm@gmail.com wrote:

...

Of those differences, I suspect that lack of a zeroconf route (169.254.0.0/16) is probably preventing access to the metadata service. Further, I believe the reason is related to the addition of NetworkManager in the F17 AMI (because the zeroconf route is typically added via the ifcfg-eth script, which NM does not run). Before I go hacking further, is there a particular reason that we switched to using NetworkManager in the F17 AMI?

NM was added to core as a fix for a F17 bug where the network wouldn't come up by default in a minimal install: - https://bugzilla.redhat.com/show_bug.cgi?id=693602

The decision was made to add NM to core because it doesn't add many extra packages and as NM adds more and more features, it doesn't make much sense to keep hacking the old network service in for minimal installs. The full details of "why" are in the bug, if you're interested.

maxamillion did a good job of summing up some of the you can do about removing/replacing NM for a regular system in his blog post: http://pseudogen.blogspot.com/2012/03/networkmanager-is-in-core-but-dont-fre...

...

Would removing it be the wrong solution, and if so, is there a quick way to ensure that NM initializes a zeroconf route?

That is certainly possible, the network service still works. It just made more sense to use NM for non-cloud minimal installs.

I'll leave the discussion of the best way to deal with NM/network to people who are far more qualified than I am. Just figured I would add in the answer to "why did this change?"

Tim

Andy Grimm

10:45 p.m.

New subject: help needed

On Fri, Mar 30, 2012 at 5:35 PM, Tim Flink tflink@redhat.com wrote:

...

On Fri, 30 Mar 2012 16:54:47 -0400 Andy Grimm agrimm@gmail.com wrote:

...
Of those differences, I suspect that lack of a zeroconf route (169.254.0.0/16) is probably preventing access to the metadata service. Further, I believe the reason is related to the addition of NetworkManager in the F17 AMI (because the zeroconf route is typically added via the ifcfg-eth script, which NM does not run). Before I go hacking further, is there a particular reason that we switched to using NetworkManager in the F17 AMI?

NM was added to core as a fix for a F17 bug where the network wouldn't come up by default in a minimal install: - https://bugzilla.redhat.com/show_bug.cgi?id=693602

The decision was made to add NM to core because it doesn't add many extra packages and as NM adds more and more features, it doesn't make much sense to keep hacking the old network service in for minimal installs. The full details of "why" are in the bug, if you're interested.

Ok, I've been on the CC list for that bug for a long time, but I missed that they actually made a decision. I'll just bite my tongue on that one; at least it's not completely broken anymore.

...

maxamillion did a good job of summing up some of the you can do about removing/replacing NM for a regular system in his blog post: http://pseudogen.blogspot.com/2012/03/networkmanager-is-in-core-but-dont-fre...

Thanks for that link!

...

...
Would removing it be the wrong solution, and if so, is there a quick way to ensure that NM initializes a zeroconf route?

That is certainly possible, the network service still works. It just made more sense to use NM for non-cloud minimal installs.

I'll leave the discussion of the best way to deal with NM/network to people who are far more qualified than I am. Just figured I would add in the answer to "why did this change?"

As it turns out, I just booted an Ubuntu Oneiric instance, and it does not have a zeroconf route, but is still able to access the metadata service, so it looks like this was a red herring. Back to the drawing board.

--Andy

...

Tim

cloud mailing list cloud@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/cloud

Andy Grimm

31 Mar 31 Mar

12:39 a.m.

New subject: help needed

Further attempting to localize the problem, I found a pickled form of the metadata in /var/lib/cloud/instances/i-325d4c56/obj.pkl which included the public ssh key, so it's definitely getting the data from the metadata service.

On Fri, Mar 30, 2012 at 5:45 PM, Andy Grimm agrimm@gmail.com wrote:

...

On Fri, Mar 30, 2012 at 5:35 PM, Tim Flink tflink@redhat.com wrote:

...
On Fri, 30 Mar 2012 16:54:47 -0400 Andy Grimm agrimm@gmail.com wrote:

...
Of those differences, I suspect that lack of a zeroconf route (169.254.0.0/16) is probably preventing access to the metadata service. Further, I believe the reason is related to the addition of NetworkManager in the F17 AMI (because the zeroconf route is typically added via the ifcfg-eth script, which NM does not run). Before I go hacking further, is there a particular reason that we switched to using NetworkManager in the F17 AMI?

NM was added to core as a fix for a F17 bug where the network wouldn't come up by default in a minimal install: - https://bugzilla.redhat.com/show_bug.cgi?id=693602

The decision was made to add NM to core because it doesn't add many extra packages and as NM adds more and more features, it doesn't make much sense to keep hacking the old network service in for minimal installs. The full details of "why" are in the bug, if you're interested.

Ok, I've been on the CC list for that bug for a long time, but I missed that they actually made a decision. I'll just bite my tongue on that one; at least it's not completely broken anymore.

...
maxamillion did a good job of summing up some of the you can do about removing/replacing NM for a regular system in his blog post: http://pseudogen.blogspot.com/2012/03/networkmanager-is-in-core-but-dont-fre...

Thanks for that link!

...
...
Would removing it be the wrong solution, and if so, is there a quick way to ensure that NM initializes a zeroconf route?

That is certainly possible, the network service still works. It just made more sense to use NM for non-cloud minimal installs.

I'll leave the discussion of the best way to deal with NM/network to people who are far more qualified than I am. Just figured I would add in the answer to "why did this change?"

As it turns out, I just booted an Ubuntu Oneiric instance, and it does not have a zeroconf route, but is still able to access the metadata service, so it looks like this was a red herring. Back to the drawing board.

--Andy

...
Tim

cloud mailing list cloud@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/cloud

Andy Grimm

2:44 p.m.

New subject: help needed

SOLVED!

...

From /usr/share/doc/cloud-init-0.6.3/ChangeLog :

"read /etc/ssh/sshd_config for AuthorizedKeysFile rather than assuming ~/.ssh/authorized_keys (LP: #731849)"

The problem is that this change in cloud-init does not properly handle relative paths, which are documented in the sshd_config manpage as being relative to the user's home directory. So the quick fix was to change /etc/ssh/sshd_config from:

AuthorizedKeysFile .ssh/authorized_keys

to:

AuthorizedKeysFile %h/.ssh/authorized_keys

The more correct fix is in cloud-init, probably something like:

--- a/cloudinit/SshUtil.py 2012-03-31 09:28:42.598996936 -0400 +++ b/cloudinit/SshUtil.py 2012-03-31 09:40:47.758829938 -0400 @@ -155,6 +155,8 @@ akeys = ssh_cfg.get("AuthorizedKeysFile", "%h/.ssh/authorized_keys") akeys = akeys.replace("%h", pwent.pw_dir) akeys = akeys.replace("%u", user) + if not akeys.startswith('/'): + akeys = os.path.join(pwent.pw_dir, akeys) authorized_keys = akeys except Exception: authorized_keys = '%s/.ssh/authorized_keys' % pwent.pw_dir

How do you want to handle this? Should I go ahead and file both RHBZ and LP issues for it?

--Andy

On Fri, Mar 30, 2012 at 7:39 PM, Andy Grimm agrimm@gmail.com wrote:

...

Further attempting to localize the problem, I found a pickled form of the metadata in /var/lib/cloud/instances/i-325d4c56/obj.pkl which included the public ssh key, so it's definitely getting the data from the metadata service.

On Fri, Mar 30, 2012 at 5:45 PM, Andy Grimm agrimm@gmail.com wrote:

...
On Fri, Mar 30, 2012 at 5:35 PM, Tim Flink tflink@redhat.com wrote:

...
On Fri, 30 Mar 2012 16:54:47 -0400 Andy Grimm agrimm@gmail.com wrote:

...
Of those differences, I suspect that lack of a zeroconf route (169.254.0.0/16) is probably preventing access to the metadata service. Further, I believe the reason is related to the addition of NetworkManager in the F17 AMI (because the zeroconf route is typically added via the ifcfg-eth script, which NM does not run). Before I go hacking further, is there a particular reason that we switched to using NetworkManager in the F17 AMI?

NM was added to core as a fix for a F17 bug where the network wouldn't come up by default in a minimal install: - https://bugzilla.redhat.com/show_bug.cgi?id=693602

The decision was made to add NM to core because it doesn't add many extra packages and as NM adds more and more features, it doesn't make much sense to keep hacking the old network service in for minimal installs. The full details of "why" are in the bug, if you're interested.

Ok, I've been on the CC list for that bug for a long time, but I missed that they actually made a decision. I'll just bite my tongue on that one; at least it's not completely broken anymore.

...
maxamillion did a good job of summing up some of the you can do about removing/replacing NM for a regular system in his blog post: http://pseudogen.blogspot.com/2012/03/networkmanager-is-in-core-but-dont-fre...

Thanks for that link!

...
...
Would removing it be the wrong solution, and if so, is there a quick way to ensure that NM initializes a zeroconf route?

That is certainly possible, the network service still works. It just made more sense to use NM for non-cloud minimal installs.

I'll leave the discussion of the best way to deal with NM/network to people who are far more qualified than I am. Just figured I would add in the answer to "why did this change?"

As it turns out, I just booted an Ubuntu Oneiric instance, and it does not have a zeroconf route, but is still able to access the metadata service, so it looks like this was a red herring. Back to the drawing board.

--Andy

...
Tim

cloud mailing list cloud@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/cloud

Andy Grimm

3:44 p.m.

New subject: help needed

In case somebody wants to do other testing, I applied the patch to cloud-init and made this AMI public:

ami-ab4698c2

On Sat, Mar 31, 2012 at 9:44 AM, Andy Grimm agrimm@gmail.com wrote:

...

SOLVED!

From /usr/share/doc/cloud-init-0.6.3/ChangeLog :

"read /etc/ssh/sshd_config for AuthorizedKeysFile rather than assuming ~/.ssh/authorized_keys (LP: #731849)"

The problem is that this change in cloud-init does not properly handle relative paths, which are documented in the sshd_config manpage as being relative to the user's home directory. So the quick fix was to change /etc/ssh/sshd_config from:

AuthorizedKeysFile .ssh/authorized_keys

to:

AuthorizedKeysFile %h/.ssh/authorized_keys

The more correct fix is in cloud-init, probably something like:

--- a/cloudinit/SshUtil.py 2012-03-31 09:28:42.598996936 -0400 +++ b/cloudinit/SshUtil.py 2012-03-31 09:40:47.758829938 -0400 @@ -155,6 +155,8 @@ akeys = ssh_cfg.get("AuthorizedKeysFile", "%h/.ssh/authorized_keys") akeys = akeys.replace("%h", pwent.pw_dir) akeys = akeys.replace("%u", user)

if not akeys.startswith('/'):

akeys = os.path.join(pwent.pw_dir, akeys)

authorized_keys = akeys except Exception: authorized_keys = '%s/.ssh/authorized_keys' % pwent.pw_dir

How do you want to handle this? Should I go ahead and file both RHBZ and LP issues for it?

--Andy

On Fri, Mar 30, 2012 at 7:39 PM, Andy Grimm agrimm@gmail.com wrote:

...
Further attempting to localize the problem, I found a pickled form of the metadata in /var/lib/cloud/instances/i-325d4c56/obj.pkl which included the public ssh key, so it's definitely getting the data from the metadata service.

On Fri, Mar 30, 2012 at 5:45 PM, Andy Grimm agrimm@gmail.com wrote:

...
On Fri, Mar 30, 2012 at 5:35 PM, Tim Flink tflink@redhat.com wrote:

...
On Fri, 30 Mar 2012 16:54:47 -0400 Andy Grimm agrimm@gmail.com wrote:

...
Of those differences, I suspect that lack of a zeroconf route (169.254.0.0/16) is probably preventing access to the metadata service. Further, I believe the reason is related to the addition of NetworkManager in the F17 AMI (because the zeroconf route is typically added via the ifcfg-eth script, which NM does not run). Before I go hacking further, is there a particular reason that we switched to using NetworkManager in the F17 AMI?

NM was added to core as a fix for a F17 bug where the network wouldn't come up by default in a minimal install: - https://bugzilla.redhat.com/show_bug.cgi?id=693602

The decision was made to add NM to core because it doesn't add many extra packages and as NM adds more and more features, it doesn't make much sense to keep hacking the old network service in for minimal installs. The full details of "why" are in the bug, if you're interested.

Ok, I've been on the CC list for that bug for a long time, but I missed that they actually made a decision. I'll just bite my tongue on that one; at least it's not completely broken anymore.

...
maxamillion did a good job of summing up some of the you can do about removing/replacing NM for a regular system in his blog post: http://pseudogen.blogspot.com/2012/03/networkmanager-is-in-core-but-dont-fre...

Thanks for that link!

...
...
Would removing it be the wrong solution, and if so, is there a quick way to ensure that NM initializes a zeroconf route?

That is certainly possible, the network service still works. It just made more sense to use NM for non-cloud minimal installs.

I'll leave the discussion of the best way to deal with NM/network to people who are far more qualified than I am. Just figured I would add in the answer to "why did this change?"

As it turns out, I just booted an Ubuntu Oneiric instance, and it does not have a zeroconf route, but is still able to access the metadata service, so it looks like this was a red herring. Back to the drawing board.

--Andy

...
Tim

cloud mailing list cloud@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/cloud

Garrett Holmstrom

9:40 p.m.

New subject: help needed

On Mar 31, 2012 6:44 AM, "Andy Grimm" agrimm@gmail.com wrote:

...

SOLVED!

From /usr/share/doc/cloud-init-0.6.3/ChangeLog :

"read /etc/ssh/sshd_config for AuthorizedKeysFile rather than assuming ~/.ssh/authorized_keys (LP: #731849)"

The problem is that this change in cloud-init does not properly handle relative paths, which are documented in the sshd_config manpage as being relative to the user's home directory. So the quick fix was to change /etc/ssh/sshd_config from:

AuthorizedKeysFile .ssh/authorized_keys

to:

AuthorizedKeysFile %h/.ssh/authorized_keys

The more correct fix is in cloud-init, probably something like:

--- a/cloudinit/SshUtil.py 2012-03-31 09:28:42.598996936 -0400 +++ b/cloudinit/SshUtil.py 2012-03-31 09:40:47.758829938 -0400 @@ -155,6 +155,8 @@ akeys = ssh_cfg.get("AuthorizedKeysFile",

"%h/.ssh/authorized_keys")

...

    akeys = akeys.replace("%h", pwent.pw_dir)
    akeys = akeys.replace("%u", user)
   if not akeys.startswith('/'):
       akeys = os.path.join(pwent.pw_dir, akeys)
  authorized_keys = akeys
except Exception: authorized_keys = '%s/.ssh/authorized_keys' % pwent.pw_dir
How do you want to handle this? Should I go ahead and file both RHBZ and LP issues for it?

If you're willing to, please do so. Otherwise I can forward a RHBZ bug to Launchpad.

Thanks for figuring this out!

4473

Age (days ago)

4475

Last active (days ago)

cloud@lists.fedoraproject.org

11 comments

5 participants

tags (0)

participants (5)

Andy Grimm
Garrett Holmstrom
Heherson Pagcaliwagan
Robyn Bergeron
Tim Flink