Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLOC-3739] Failover acceptance test enabled #2546

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
5 changes: 0 additions & 5 deletions admin/acceptance.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,6 @@ def get_trial_environment(cluster):
'FLOCKER_ACCEPTANCE_VOLUME_BACKEND': cluster.dataset_backend.name,
'FLOCKER_ACCEPTANCE_API_CERTIFICATES_PATH':
cluster.certificates_path.path,
'FLOCKER_ACCEPTANCE_HOSTNAME_TO_PUBLIC_ADDRESS': json.dumps({
node.private_address: node.address
for node in cluster.agent_nodes
if node.private_address is not None
}),
'FLOCKER_ACCEPTANCE_DEFAULT_VOLUME_SIZE': bytes(
cluster.default_volume_size
),
Expand Down
1 change: 1 addition & 0 deletions docs/releasenotes/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Next Release
* The new :ref:`CloudFormation installer <cloudformation>` has been made available, to provide a far simpler installation experience for users on AWS.
* The :ref:`Flocker plugin for Docker <plugin>` should support the direct volume listing and inspection functionality added to Docker 1.10.
* Fixed a regression that caused block device agents to poll backend APIs like EBS too frequently in some circumstances.
* Datasets can now be moved off of shutdown EC2 instances. Previously they could only be moved off of terminated instances.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything that prevents someone trying this on Rackspace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried it on Rackspace manually. It doesn't work due to Rackspace bugs. It does work on OpenStack elsewhere (at least on Swisscom's cluster) but I don't want to claim that publicly until we can enable the acceptance tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but is it a matter of hoping that no one tries it? If so are we
happy with that?

On Fri, 29 Jan 2016 at 14:31, Itamar Turner-Trauring <
[email protected]> wrote:

In docs/releasenotes/index.rst
#2546 (comment):

@@ -16,6 +16,7 @@ Next Release

  • The new :ref:CloudFormation installer <cloudformation> has been made available, to provide a far simpler installation experience for users on AWS.
  • The :ref:Flocker plugin for Docker <plugin> should support the direct volume listing and inspection functionality added to Docker 1.10.
  • Fixed a regression that caused block device agents to poll backend APIs like EBS too frequently in some circumstances.
    +* Datasets can now be moved off of shutdown EC2 instances. Previously they could only be moved off of terminated instances.

I've tried it on Rackspace manually. It doesn't work due to Rackspace
bugs. It does work on OpenStack elsewhere (at least on Swisscom's
cluster) but I don't want to claim that publicly until we can enable the
acceptance tests.


Reply to this email directly or view it on GitHub
https://github.com/ClusterHQ/flocker/pull/2546/files#r51266013.


This Release
============
Expand Down
32 changes: 22 additions & 10 deletions flocker/acceptance/endtoend/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
"""

from uuid import UUID
from unittest import SkipTest, skipIf
from unittest import SkipTest
from datetime import timedelta

from ipaddress import ip_address

from testtools import run_test_with

from twisted.internet import reactor
Expand All @@ -18,7 +20,7 @@

from ..testtools import (
require_cluster, require_moving_backend, create_dataset, DatasetBackend,
get_backend_api, verify_socket
get_backend_api, verify_socket, is_public_ip,
)


Expand Down Expand Up @@ -109,18 +111,19 @@ def not_exists():
created.addCallback(delete_dataset)
return created

@skipIf(True,
"Shutting down a node invalidates a public IP, which breaks all "
"kinds of things. So skip for now.")
@require_moving_backend
@run_test_with(async_runner(timeout=timedelta(minutes=6)))
@require_cluster(2)
def test_dataset_move_from_dead_node(self, cluster):
# Rackspace is buggy so this doesn't work there; once we're off
# Rackspace we should re-enable it for everything (FLOC-4001):
@require_cluster(2, required_backend=DatasetBackend.aws)
def test_dataset_move_from_dead_node(self, cluster, backend):
"""
A dataset can be moved from a dead node to a live node.

All attributes, including the maximum size, are preserved.
"""
# We could use backend argument, but we're going to drop it as
# part of FLOC-4001, so make our own.
api = get_backend_api(self, cluster.cluster_uuid)
if not ICloudAPI.providedBy(api):
raise SkipTest(
Expand All @@ -139,9 +142,18 @@ def test_dataset_move_from_dead_node(self, cluster):

def startup_node(node_id):
api.start_node(node_id)
# Wait for node to boot up:; we presume Flocker getting going after
# SSH is available will be pretty quick:
return loop_until(reactor, verify_socket(node.public_address, 22))
# Wait for node to boot up:
d = loop_until(
reactor, lambda: node_id in api.list_live_nodes())

# Wait for it to be accessible over SSH, on theory that means
# it's ready to be used:
def wait_for_ssh(_):
ips = [ip_address(i) for i in api.list_live_nodes()[node_id]]
public_ip = [unicode(i) for i in ips if is_public_ip(i)][0]
return verify_socket(public_ip, 22)
d.addCallback(wait_for_ssh)
return d

# Once created, shut down origin node and then request to move the
# dataset to node2:
Expand Down
1 change: 0 additions & 1 deletion flocker/acceptance/endtoend/test_installer.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,6 @@ def _cleanup_flocker(self):
control_node=self.control_node_ip.encode('ascii'),
certificates_path=local_certs_path,
num_agent_nodes=2,
hostname_to_public_address={},
username='user1',
)
d.addCallback(
Expand Down
67 changes: 52 additions & 15 deletions flocker/acceptance/testtools.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

from docker.tls import TLSConfig

from ipaddress import ip_address

from twisted.web.http import OK, CREATED
from twisted.python.filepath import FilePath
from twisted.python.constants import Names, NamedConstant
Expand Down Expand Up @@ -45,6 +47,7 @@
from ..apiclient import FlockerClient, DatasetState
from ..node.script import get_backend, get_api
from ..node import dockerpy_client
from ..node.agents.blockdevice import ICloudAPI

from .node_scripts import SCRIPTS as NODE_SCRIPTS

Expand Down Expand Up @@ -826,7 +829,7 @@ def get_file(self, node, path):

def connected_cluster(
reactor, control_node, certificates_path, num_agent_nodes,
hostname_to_public_address, username='user',
username='user',
):
cluster_cert = certificates_path.child(b"cluster.crt")
user_cert = certificates_path.child(
Expand Down Expand Up @@ -869,24 +872,63 @@ def failed_query(failure):
failed_query)
return d
agents_connected = loop_until(reactor, nodes_available)
agents_connected.addCallback(lambda _: _add_nodes(cluster))
return agents_connected


def is_public_ip(ip):
"""
:param IPAddress ip: An IP address.
:return: Boolean which is true if it is a public address.
"""
return not any([ip.is_private, ip.is_link_local, ip.is_loopback])


def _add_nodes(cluster):
"""
Configure the ``Node`` objects for a newly created ``Cluster`` whose
nodes are known to be alive.

:param Cluster cluster: Cluster that still needs nodes set.
:return: ``cluster`` updated with appropriate ``nodes`` set.
"""
# By default we just trust address returned by Flocker
def default_get_public_ip(address):
return address

# Extract node hostnames from API that lists nodes. Currently we
# happen know these in advance, but in FLOC-1631 node identification
# will switch to UUIDs instead.
agents_connected.addCallback(lambda _: cluster.current_nodes())
try:
backend = get_backend_api(None, cluster.cluster_uuid)
except SkipTest:
# Can't load backend, will have to trust Flocker's reported IPs.
get_public_ip = default_get_public_ip
else:
if ICloudAPI.providedBy(backend):
node_ips = list(set(ip_address(i) for i in ips)
for ips in backend.list_live_nodes().values())

def get_public_ip(address):
for ips in node_ips:
if ip_address(address) in ips:
return [unicode(ip) for ip in ips
if is_public_ip(ip)][0]
raise ValueError(
"Couldn't find address in cloud API reported IPs")
else:
get_public_ip = default_get_public_ip

def node_from_dict(node):
reported_hostname = node["host"]
public_address = hostname_to_public_address.get(
reported_hostname, reported_hostname)
public_address = get_public_ip(reported_hostname)
return Node(
uuid=node[u"uuid"],
public_address=public_address.encode("ascii"),
reported_hostname=reported_hostname.encode("ascii"),
)
agents_connected.addCallback(lambda nodes: cluster.set(
"nodes", map(node_from_dict, nodes)))
return agents_connected

d = cluster.current_nodes()
d.addCallback(
lambda nodes: cluster.set("nodes", map(node_from_dict, nodes)))
return d


def _get_test_cluster(reactor):
Expand Down Expand Up @@ -914,16 +956,11 @@ def _get_test_cluster(reactor):
certificates_path = FilePath(
environ["FLOCKER_ACCEPTANCE_API_CERTIFICATES_PATH"])

hostname_to_public_address_env_var = environ.get(
"FLOCKER_ACCEPTANCE_HOSTNAME_TO_PUBLIC_ADDRESS", "{}")
hostname_to_public_address = json.loads(hostname_to_public_address_env_var)

return connected_cluster(
reactor,
control_node,
certificates_path,
num_agent_nodes,
hostname_to_public_address
)


Expand Down
9 changes: 6 additions & 3 deletions flocker/node/agents/blockdevice.py
Original file line number Diff line number Diff line change
Expand Up @@ -1121,8 +1121,10 @@ def list_live_nodes():
This is used to figure out which nodes are dead, so that other
nodes can do the detach.

:returns: A collection of ``unicode`` compute instance IDs, compatible
with those returned by ``IBlockDeviceAPI.compute_instance_id``.
:returns: A mapping of ``unicode`` compute instance IDs
(compatible with those returned by
``IBlockDeviceAPI.compute_instance_id``) to IPs of those
nodes, also as unicode..
"""

def start_node(node_id):
Expand Down Expand Up @@ -1647,7 +1649,8 @@ def is_existing_block_device(dataset_id, path):
pass

if ICloudAPI.providedBy(self._underlying_blockdevice_api):
live_instances = self._underlying_blockdevice_api.list_live_nodes()
live_instances = list(
self._underlying_blockdevice_api.list_live_nodes())
else:
# Can't know accurately who is alive and who is dead:
live_instances = None
Expand Down
7 changes: 5 additions & 2 deletions flocker/node/agents/cinder.py
Original file line number Diff line number Diff line change
Expand Up @@ -689,8 +689,11 @@ def get_device_path(self, blockdevice_id):

# ICloudAPI:
def list_live_nodes(self):
return list(server.id for server in self.nova_server_manager.list()
if server.status == u'ACTIVE')
return {server.id:
list(map(
unicode, _extract_nova_server_addresses(server.addresses)))
for server in self.nova_server_manager.list()
if server.status == u'ACTIVE'}

def start_node(self, node_id):
server = self.nova_server_manager.get(node_id)
Expand Down
9 changes: 6 additions & 3 deletions flocker/node/agents/ebs.py
Original file line number Diff line number Diff line change
Expand Up @@ -719,7 +719,7 @@ def _wait_for_volume_state_change(operation,
start_time = time.time()
poll_until(
lambda: _reached_end_state(
operation, volume, update, time.time() - start_time, timeout
operation, volume, update, time.time() - start_time, timeout
),
itertools.repeat(1)
)
Expand Down Expand Up @@ -1363,11 +1363,14 @@ def get_device_path(self, blockdevice_id):
def list_live_nodes(self):
instances = self.connection.instances.filter(
Filters=[{'Name': 'instance-state-name', 'Values': ['running']}])
return list(unicode(instance.id) for instance in instances)
return {unicode(instance.id):
[unicode(instance.public_ip_address),
unicode(instance.private_ip_address)]
for instance in instances}

@boto3_log
def start_node(self, node_id):
self.connection.start_instances(instance_ids=[node_id])
self.connection.instances.filter(InstanceIds=[node_id]).start()


def aws_from_configuration(
Expand Down
25 changes: 22 additions & 3 deletions flocker/node/agents/test/test_blockdevice.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@
_backing_file_name,
)
from ....common.algebraic import tagged_union_strategy

from ....common import get_all_ips

from ... import run_state_change, in_parallel, ILocalState, IStateChange, NoOp
from ...testtools import (
Expand Down Expand Up @@ -754,7 +754,12 @@ def __init__(self, block_api, live_nodes=()):
self.live_nodes = live_nodes

def list_live_nodes(self):
return [self.compute_instance_id()] + list(self.live_nodes)
result = {self.compute_instance_id():
set(unicode(i) for i in get_all_ips()
if i != b"127.0.0.1")}
result.update({node: [u"10.1.1.{}".format(i)]
for i, node in enumerate(self.live_nodes)})
return result

def start_node(self, node_id):
return
Expand Down Expand Up @@ -5501,11 +5506,25 @@ def test_current_machine_is_live(self):
self.assertIn(self.api.compute_instance_id(), live))
return d

def test_current_machine_has_appropriate_ip(self):
"""
The machine's known IP is set for the current node.
"""
local_addresses = set(unicode(i) for i in get_all_ips()
if i != b"127.0.0.1")
d = self.async_cloud_api.list_live_nodes()
d.addCallback(
lambda live:
self.assertTrue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does assertIn work here?

set(live[self.api.compute_instance_id()]).intersection(
local_addresses)))
return d

def test_list_live_nodes(self):
"""
``list_live_nodes`` returns an iterable of unicode values.
"""
live_nodes = self.api.list_live_nodes()
live_nodes = list(self.api.list_live_nodes())
self.assertThat(live_nodes, AllMatch(IsInstance(unicode)))

return Tests
Expand Down