google cloud platform - Dataflow pipline "lost contact with the service" -


i'm running trouble apache beam pipline on google cloud dataflow.

the pipeline simple: reading json gcs, extracting text nested fields, writing gcs.

it works fine when testing smaller subset of input files when run on full data set, following error (after running fine through around 260m items).

somehow "worker lost contact service"

  (8662a188e74dae87): workflow failed. causes: (95e9c3f710c71bc2): s04:readfromtextwithfilename/read+flatmap(extract_text_from_raw)+removelinebreaks+formattext+writetext/write/writeimpl/writebundles/do+writetext/write/writeimpl/pair+writetext/write/writeimpl/windowinto(windowintofn)+writetext/write/writeimpl/groupbykey/reify+writetext/write/writeimpl/groupbykey/write failed., (da6389e4b594e34b): work item attempted 4 times without success. each time worker lost contact service. work item attempted on:    extract-tags-150110997000-07261602-0a01-harness-jzcn,   extract-tags-150110997000-07261602-0a01-harness-828c,   extract-tags-150110997000-07261602-0a01-harness-3w45,   extract-tags-150110997000-07261602-0a01-harness-zn6v 

the stacktrace shows failed update work status/progress reporting thread got error error:

exception in worker loop: traceback (most recent call last): file "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 776, in run deferred_exception_details=deferred_exception_details) file "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 629, in do_work exception_details=exception_details) file "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 168, in wrapper return fun(*args, **kwargs) file "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 490, in report_completion_status exception_details=exception_details) file "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 298, in report_status work_executor=self._work_executor) file "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", line 333, in report_status self._client.projects_locations_jobs_workitems.reportstatus(request)) file "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py", line 467, in reportstatus config, request, global_params=global_params) file "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 723, in _runmethod return self.processhttpresponse(method_config, http_response, request) file "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 729, in processhttpresponse self.__processhttpresponse(method_config, http_response, request)) file "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 600, in __processhttpresponse http_response.request_url, method_config, request) httperror: httperror accessing <https://dataflow.googleapis.com/v1b3/projects/qollaboration-live/locations/us-central1/jobs/2017-07-26_16_02_36-1885237888618334364/workitems:reportstatus?alt=json>: response: <{'status': '400', 'content-length': '360', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'origin, x-origin, referer', 'server': 'esf', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'wed, 26 jul 2017 23:54:12 gmt', 'x-frame-options': 'sameorigin', 'content-type': 'application/json; charset=utf-8'}>, content <{ "error": { "code": 400, "message": "(7f8a0ec09d20c3a3): failed publish result of work update. causes: (7f8a0ec09d20cd48): failed update work status. causes: (afa1cd74b2e65619): failed update work status., (afa1cd74b2e65caa): work \"6306998912537661254\" not leased (or lease lost).", "status": "invalid_argument" } } > 

and finally:

httperror: httperror accessing <https://dataflow.googleapis.com/v1b3/projects/[projectid-redacted]/locations/us-central1/jobs/2017-07-26_18_28_43-10867107563808864085/workitems:reportstatus?alt=json>: response: <{'status': '400', 'content-length': '358', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'origin, x-origin, referer', 'server': 'esf', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'thu, 27 jul 2017 02:00:10 gmt', 'x-frame-options': 'sameorigin', 'content-type': 'application/json; charset=utf-8'}>, content <{ "error": { "code": 400, "message": "(5845363977e915c1): failed publish result of work update. causes: (5845363977e913a8): failed update work status. causes: (44379dfdb8c2b47): failed update work status., (44379dfdb8c2e88): work \"9100669328839864782\" not leased (or lease lost).", "status": "invalid_argument" } } > @ __processhttpresponse (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:600) @ processhttpresponse (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:729) @ _runmethod (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:723) @ reportstatus (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py:467) @ report_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py:333) @ report_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:298) @ report_completion_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:490) @ wrapper (/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py:168) @ do_work (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:629) @ run (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:776) 

this looks error data flow internals me. can confirm? there workarounds?

the httperror typically appears after workflow has failed , part of failure/teardown process.

it looks there others error reported in pipeline, such following. note if same elements fail 4 times marked failing.

try looking stack traces section in ui identify other errors , stack traces. since occurs on larger dataset, consider possibility of being malformed elements exist in larger dataset.


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -