Clickhouse exception code 241



Memory limit (total) exceeded during insert leads to partially write #11546

Comments

dmitryikh commented Jun 9, 2020 •

(you don’t have to strictly follow this form)

Describe the bug
I use java clickhouse connector to insert data into Clickhouse. Once I inserted 4 rows and got the answer from the clickhouse:

(The memory seems to be consumed by another queries).

After that I found that 3 of 4 records was inserted and the one (last in ORDER BY sort) was not inserted.

It seems that clickhouse violates block insert atomicity in case of out of memory errors?

IMHO, there should be atomicity in block write — either all rows are inserted or none of rows.

The text was updated successfully, but these errors were encountered:

alexey-milovidov commented Jun 9, 2020

Insertion of one block into table of MergeTree family is atomic. This is true in all ClickHouse versions.
Is it true that all records were sent in one block?

dmitryikh commented Jun 9, 2020 •

I didn’t dive into Java Clickhouse driver. But I suppose that this code should insert in one block:

We do inserts once per minute and a single block can be easily detected by the same value in updated_time column (default NOW()).

We bumped into the problem described above only once when got out of memory error..

den-crane commented Jun 9, 2020

@dmitryikh Check that all 4 records belong to 1 partition (table partition by). Partitions break insert to several blocks.

dmitryikh commented Jun 9, 2020

@den-crane , Yes! you right! I missed that.
Thank you.

I’m going to close the issue.

RonanMorgan commented Jul 17, 2020 •

I have a similar issue,
Caused by: java.lang.Throwable: Code: 241, e.displayText() = DB::Exception: Memory limit (total) exceeded: would use 28.06 GiB (attempt to allocate chunk of 4382610 bytes), maximum: 28.06 GiB (version 20.5.2.7 (official build))

I am using the jdbc official driver to save 1 500K rows, PARTITION BY the starting date (and it’s a two-days data set)

I tried to set
1024
128
In the user.xml configuration, in the default part, but it didn’t work.

Any idea ? Do I have to create a new issue for that ?

RonanMorgan commented Jul 17, 2020 •

ok I fixed my issue by using the batch API of the jdbc driver instead of the writer one, I still don’t understand the error log though (maybe an overhead when writting rows one by one)

den-crane commented Jul 17, 2020

@RonanMorgan you can’t set min_insert_block_size_rows / max_insert_block_size in server. They should be set as query properties in JDBC query.
I think your issue is because of JSONEachRow / parralel parsing, try to set input_format_parallel_parsing=0 in JDBC

Qsimple commented Jun 28, 2021

meet same error, the server version is 20.4.2。When I insert 400MB+ data into ONE partition of a merge tree table, the Memory limit(total) exception happen. The client has retry, but finally, we got more data in clickhouse then in datasource (hive).
the SUM(writen_rows) in system.query_log show the same data size with the datasource. But the sum(rows) system.parts show the more data size.
It’s exactly 1000 times type=2 query in system.query_logs and 4 times error_code=241 query. How the locate the extra data in Clickhouse.

Hope your reponse, tks

Footer

© 2023 GitHub, Inc.

You can’t perform that action at this time.

Читайте также:  Considered as an error

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

Источник

insert on distributed table encounter Memory limit (for query) exceeded #5959

Comments

renkun2020 commented Jul 10, 2019 •

our ck version is 19.7.3 and it’s a 3-shards-2-replication cluster, recently we encounter following error message

2019.06.22 04:25:56.120012 [ 50 ] <2ee5b326-1f6b-4e73-b28b-5ab356f08ecd>executeQuery: Code: 241, e.displayText() = DB::Exception: Memory limit (for query) exceeded: would use 29.16 GiB (attempt to allocate chunk of 2147483648 bytes), maximum: 27.94 GiB (version 19.7.3.1) (from 172.31.0.67:5684) (in query: INSERT INTO codes FORMAT CSV), Stack trace:
2019.06.22 08:00:26.476234 [ 50 ] <82174cb3-b9c9-482e-b20c-eda5ae726d8b>executeQuery: Code: 241, e.displayText() = DB::Exception: Memory limit (for query) exceeded: would use 29.16 GiB (attempt to allocate chunk of 2147483648 bytes), maximum: 27.94 GiB (version 19.7.3.1) (from 172.31.0.67:16122) (in query: INSERT INTO codes FORMAT CSV), Stack trace:
.
2019.06.26 22:38:11.122749 [ 50 ] executeQuery: Code: 241, e.displayText() = DB::Exception: Memory limit (for query) exceeded: would use 48.13 GiB (attempt to allocate chunk of 4294967296 bytes), maximum: 48.00 GiB (version 19.7.3.1) (from 172.31.0.67:34576) (in query: INSERT INTO codes FORMAT CSV), Stack trace:
2019.06.26 23:48:11.923896 [ 29 ] <835c9322-79af-4223-98a0-179dc7cb32aa>executeQuery: Code: 241, e.displayText() = DB::Exception: Memory limit (for query) exceeded: would use 48.13 GiB (attempt to allocate chunk of 4294967296 bytes), maximum: 48.00 GiB (version 19.7.3.1) (from 172.31.0.67:38092) (in query: INSERT INTO codes FORMAT CSV), Stack trace:

it’s deployed as a job on the ck local server, codes is a distributed table based on replicatedReplacingMergeTree

CREATE TABLE codes (
.
) ENGINE = Distributed(***, , codes_loc, rand())
CREATE TABLE codes_loc (
.
) ENGINE = ReplicatedReplacingMergeTree(‘/data/
//codes’, ‘‘, ctime) PARTITION BY toYYYYMMDD(ptime) ORDER BY (ptime,ip, matchId) SETTINGS index_granularity = 8192

the table only has less than 40 columns, and here is the user profile

51539607552
51539607552
1
35769803776
35769803776
random
1

My question is: why a single insert query could consume so much memory?
the db source is mongo, it seems ck has to cache all the intermediate results in physical memory before inserting back to ck.
As the source data increases and larger than the physical memory in the future, the error should occur again.
Any parameters could avoid this?

thanks for your reply!

The text was updated successfully, but these errors were encountered:

den-crane commented Jul 10, 2019

It could if for example table codes has 10k columns.

renkun2020 commented Jul 11, 2019

the table only has less than 40 columns, and here is the user profile

51539607552
51539607552
1
35769803776
35769803776
random
1

filimonov commented Jul 11, 2019 •

That sound suspicious. Can you please:

  1. recheck select * from system.settings where changed=1 (especially block sizes)
  2. do a tee to file in your pipe chain to dump what exactly was passed. It’s possible that something is wrong with delimiters and CH interprets some big check of data as one huge field. I.e. something like
  1. if a problem reproduces — please examine the contents of data piped to clickhouse. If it looks normal — please try to create some reproducible example.

renkun2020 commented Jul 11, 2019

That sound suspicious. Can you please:

  1. recheck select * from system.settings where changed=1 (especially block sizes)
  2. do a tee to file in your pipe chain to dump what exactly was passed. It’s possible that something is wrong with delimiters and CH interprets some big check of data as one huge field. I.e. something like
  1. if a problem reproduces — please examine the contents of data piped to clickhouse. If it looks normal — please try to create some reproducible example.
Читайте также:  Exe bad image error

thanks for your suggesions.
Here is info for tip 1
┌─name───────────────────────────────┬─value───────┬─changed─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ use_uncompressed_cache │ 1 │ 1 │ «Whether to use the cache of uncompressed blocks.» │
│ load_balancing │ random │ 1 │ «Which replicas (among healthy replicas) to preferably send a query to (on the first attempt) for distributed processing.» │
│ log_queries │ 1 │ 1 │ «Log requests and write the log to the system table.» │
│ max_bytes_before_external_group_by │ 35769803776 │ 1 │ «» │
│ max_bytes_before_external_sort │ 35769803776 │ 1 │ «» │
│ max_memory_usage │ 51539607552 │ 1 │ «Maximum memory usage for processing of single query. Zero means unlimited.» │
│ max_memory_usage_for_all_queries │ 51539607552 │ 1 │ «Maximum memory usage for processing all concurrently running queries on the server. Zero means unlimited.» │
└────────────────────────────────────┴─────────────┴─────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

the interpretation should be okay since we get the expected result in ck table, anyway I will try the tee command, and update here if more findings.

renkun2020 commented Jul 17, 2019

I took some testing on DEV,
here is the sql,
cat passed_data.csv | clickhouse-client -m —user=**** —password=**** —host=***** —port=9000 —database=test ‘—format_csv_delimiter=|’ ‘—query=INSERT INTO temp FORMAT CSV’
temp is a local ReplicatedMergeTree table.

2147483648
1073741824
1073741824

error message
2019.07.17 09:25:15.601204 [ 62 ] <94c5e01c-908b-4c3c-9201-48ce6aaf515e>MemoryTracker: Current memory usage: 1.01 GiB.
2019.07.17 09:25:15.650751 [ 20 ] <> test.rk1 (ReplicatedMergeTreeCleanupThread): Execution took 683 ms.
2019.07.17 09:25:15.788961 [ 16 ] <> test.rk1 (StorageReplicatedMergeTree::mergeSelectingTask): Execution took 793 ms.
2019.07.17 09:25:17.056353 [ 62 ] <94c5e01c-908b-4c3c-9201-48ce6aaf515e>executeQuery: Code: 241, e.displayText() = DB::Exception: Memory limit (for query) exceeded: would use 2.01 GiB (attempt to allocate chunk of 1073741824 bytes), maximum: 2.00 GiB (version 19.5.3.1) (from [::ffff:172.16.8.84]:41628) (in query: INSERT INTO matchs_loc_temp FORMAT CSV), Stack trace:

2、
4294967296
2147483648
2147483648

error message
2019.07.17 09:45:40.960206 [ 59 ] MemoryTracker: Current memory usage: 1.01 GiB.
2019.07.17 09:45:41.633137 [ 59 ] MemoryTracker: Current memory usage: 2.01 GiB.
2019.07.17 09:45:43.125405 [ 3 ] <> SystemLog (system.query_thread_log): Flushing system log
2019.07.17 09:45:43.125421 [ 59 ] executeQuery: Code: 241, e.displayText() = DB::Exception: Memory limit (for query) exceeded: would use 4.01 GiB (attempt to allocate chunk of 2147483648 bytes), maximum: 4.00 GiB (version 19.5.3.1) (from [::ffff:172.16.8.84]:41706) (in query: INSERT INTO matchs_loc_temp FORMAT CSV), Stack trace:

And the conclusion is:
for «insert into csv», CK will cache all the immediate results in phy mem, and the hard limit decided by max_bytes_before_external_sort & max_memory_usage, we will encounter the «Memory limit» once the csv is too large or parameters too little.

We split the csv into several slices on linux, then do the «insert into csv» in batches, which should avoid the error in the future.

filimonov commented Jul 17, 2019

How many rows do you insert at once? How big each row is? What are the datatypes of columns?

renkun2020 commented Jul 18, 2019

How many rows do you insert at once? How big each row is? What are the datatypes of columns?

here is the table column datatype
┌─type─────┐
│ String │
│ String │
│ String │
│ String │
│ String │
│ String │
│ Int64 │
│ Int64 │
│ String │
│ String │
│ UInt8 │
│ UInt8 │
│ UInt8 │
│ Int32 │
│ String │
│ Int32 │
│ Float32 │
│ Int32 │
│ Int32 │
│ Float64 │
│ Float64 │
│ Int32 │
│ Int32 │
│ DateTime │
│ DateTime │
└──────────┘

Читайте также:  Error when reading var lib rabbitmq erlang cookie eacces

25 rows in set. Elapsed: 0.002 sec.

and below is the test data
[root@master tmp]# du -sh passed_data.csv
2.1G passed_data.csv
[root@master tmp]# wc -l passed_data.csv
36953 passed_data.csv

filimonov commented Jul 19, 2019

[root@master tmp]# du -sh passed_data.csv
2.1G passed_data.csv
[root@master tmp]# wc -l passed_data.csv
36953 passed_data.csv

That explains everything. You have avg size of the row about 57Kb. It’s very big rows.

ClickHouse tries to form blocks of max_insert_block_size (by default 1048576) to sort them and put new part to destination table. In that case 1Mln row is definitely too much. Try to decrease that setting to lower value.

Also due to sparse index scans in your case can be very costy, and may be it would be better to decrease index_granularity, or consider some data normalization / preprocessing.

renkun2020 commented Jul 22, 2019

[root@master tmp]# du -sh passed_data.csv
2.1G passed_data.csv
[root@master tmp]# wc -l passed_data.csv
36953 passed_data.csv

That explains everything. You have avg size of the row about 57Kb. It’s very big rows.

ClickHouse tries to form blocks of max_insert_block_size (by default 1048576) to sort them and put new part to destination table. In that case 1Mln row is definitely too much. Try to decrease that setting to lower value.

Also due to sparse index scans in your case can be very costy, and may be it would be better to decrease index_granularity, or consider some data normalization / preprocessing.

thanks for your suggestions.
The max_insert_block_size was kept as default settings ,1048576, now I shrink it as 1024, but still get the same error:
[root@master tmp]# cat passed_data.csv | clickhouse-client -m —user=** —password=** —host=** —port=9000 —database=** ‘—format_csv_delimiter=|’ ‘—query=INSERT INTO matchs_loc_temp FORMAT CSV’
Received exception from server (version 19.5.3):
Code: 241. DB::Exception: Received from **:9000. DB::Exception: Memory limit (for query) exceeded: would use 4.01 GiB (attempt to allocate chunk of 2147483648 bytes), maximum: 4.00 GiB.

And I notice that this parameter only affects when using HTTP, but invalid with clickhouse-client
https://clickhouse.yandex/docs/en/operations/settings/settings/
max_insert_block_size¶
The size of blocks to form for insertion into a table. This setting only applies in cases when the server forms the blocks. For example, for an INSERT via the HTTP interface, the server parses the data format and forms blocks of the specified size. But when using clickhouse-client, the client parses the data itself, and the ‘max_insert_block_size’ setting on the server doesn’t affect the size of the inserted blocks.

Anyway, I tried all the parameters related with block size, here is latest settings
┌─name─────────────────────────────────────┬─value─┐
│ max_compress_block_size │ 1024 │
│ max_block_size │ 1024 │
│ max_insert_block_size │ 1024 │
│ preferred_max_column_in_block_size_bytes │ 0 │
│ max_partitions_per_insert_block │ 100 │
└──────────────────────────────────────────┴───────┘

but the error still there!
Finally I remove the limit on max_memory_usage & max_bytes_before_external_sort, the import succeed
┌─name───────────────────────────┬─value─┐
│ max_bytes_before_external_sort │ 0 │
│ max_memory_usage │ 0 │
└────────────────────────────────┴───────┘
SELECT count(*)
FROM matchs_loc_temp

Источник

Оцените статью
toolgir.ru
Adblock
detector