Why did I receive a «No space left on device” or «DiskFull» error on Amazon RDS for PostgreSQL?
Last updated: 2022-06-23
I have a small Amazon Relational Database Service (Amazon RDS) for PostgreSQL database. The instance’s free storage space is decreasing, and I receive the following error:
«Error message: PG::DiskFull: ERROR: could not extend file «base/16394/5139755″: No space left on device. HINT: Check free disk space.»
I want to resolve the DiskFull errors and prevent storage issues.
- Temporary tables or files that are created by PostgreSQL transactions
- Data files
- Write ahead logs (WAL logs)
- Replication slots
- DB logs (error files) that are retained for too long
- Other DB or Linux files that support the consistent state of the RDS DB instance
1. Use Amazon CloudWatch to monitor your DB storage space using the FreeStorageSpace metric. When you set a CloudWatch alarm for free storage space, you receive a notification when the space starts to decrease. If you receive an alarm, review the causes of storage issues mentioned previously.
2. If your DB instance is still consuming more storage than expected, check for the following:
- Size of the DB log files
- Presence of temporary files
- Constant increase in transaction logs disk usage
- Replication slot:
- Physical replication slots are created by cross-Region read replicas or same-Region read replicas only if they are running on PostgreSQL 14.1 and higher versions
- Logical replication slots are created for a replica or subscriber
- Bloat or improper removal of dead rows
- Presence of orphaned files
3. When your workload is predictable, enable storage autoscaling for your instance. With storage autoscaling enabled, when Amazon RDS detects that you are running out of free database space, your storage is automatically scaled. Amazon RDS starts a storage modification for an autoscaling-enabled DB instance when the following factors apply:
- Free available space is less than 10 percent of the allocated storage.
- The low-storage condition lasts at least five minutes.
- At least six hours have passed since the last storage modification, or storage optimization has completed on the instance, whichever is longer.
You can set a limit for autoscaling your DB instance by setting the maximum storage threshold. For more information, see Managing capacity automatically with Amazon RDS storage autoscaling.
Check the size of the DB log files
By default, Amazon RDS for PostgreSQL error log files have a retention value of 4,320 minutes (three days). Large log files can use more space because of higher workloads or excessive logging. You can change the retention period for system logs using the rds.log_retention_period parameter in the DB parameter group associated with your DB instance. For example, if you set the value to 1440, then logs are retained for one day. For more information, see PostgreSQL database log files.
Also, you can change error reporting and logging parameters in the DB parameter group to reduce excessive logging. This in turn reduces the log file size. For more information, see Error reporting and logging.
Check for temporary files
Temporary files are files that are stored per backend or session connection. These files are used as a resource pool. Review temporary files statistics by running a command similar to this:
Important: The columns temp_files and temp_bytes in view pg_stat_database are collecting statistics in aggregation (accumulative). This is by design because these counters are reset only by recovery at server start. That is, the counters are reset after an immediate shutdown, a server crash, or a point-in-time recovery (PITR). For this reason, it’s a best practice to monitor the growth of these files in number and size, rather than reviewing only the output.
Temporary files are created for sorts, hashes, or temporary query results. To track the creation of temporary tables or files, set log_temp_files in a custom parameter group. This parameter controls the logging of temporary file names and sizes. If you set the log_temp_files value to , then all temporary file information is logged. If you set the parameter to a positive value, then only files that are equal to or larger than the specified number of kilobytes are logged. The default setting is -1, which disables the logging of temporary files.
You can also use an EXPLAIN ANALYZE of your query to review disk sorting. When you review the log output, you can see the size of temporary files created by your query. For more information, see the PostgreSQL documentation for Monitoring database activity.
Check for a constant increase in transaction logs disk usage
The CloudWatch metric for TransactionLogsDiskUsage represents the disk space used by transaction WALs. Increases in transaction log disk usage can happen because of:
- High DB loads (writes and updates that generate additional WALs)
- Streaming read replica lag (replicas in the same Region) or read replica in storage full state
- Replication slots
Replication slots can be created as part of logical decoding feature of AWS Database Migration Service (AWS DMS). For logical replication, the slot parameter rds.logical_replication is set to 1. Replication slots retain the WAL files until the files are externally consumed by a consumer. For example, they might be consumed by pg_recvlogical; extract, transform, and load (ETL) jobs; or AWS DMS.
If you set the rds.logical_replication parameter value to 1, then AWS RDS sets the wal_level, max_wal_senders, max_replication_slots, and max_connections parameters. Changing these parameters can increase WAL generation. It’s a best practice to set the rds.logical_replication parameter only when you are using logical slots. If this parameter is set to 1 and logical replication slots are present but there isn’t a consumer for the WAL files retained by the replication slot, then then transaction logs disk usage can increase. This also results in a constant decrease in free storage space.
Run this query to confirm the presence and size of replication slots:
PostgreSQL v10 and later:
After you identify the replication slot that isn’t being consumed (with an active state that is False), drop the replication slot by running this query:
Note: If an AWS DMS task is the consumer and it is no longer required, then delete the task and manually drop the replication slot.
In this example, the slot name xc36ujql35djp_00013322_907c1e0a_9f8b_4c13_89ea_ef0ea1cf143d has an active state that is False. So this slot isn’t actively used, and the slot is contributing to 129 GB of transaction files.
Drop the query by running the following command:
Check the status of cross-Region read replicas
When you use cross-Region read replication, a physical replication slot is created on the primary instance. If the cross-Region read replica fails, then the storage space on the primary DB instance can be affected. This happens because the WAL files aren’t replicated over to the read replica. You can use CloudWatch metrics, Oldest Replication Slot Lag, and Transaction Logs Disk Usage to determine how far behind the most lagging replica is. You can also see how much storage is used for WAL data.
To check the status of cross-Region read replica, use query pg_replication_slots. For more information, see the PostgreSQL documentation for pg_replication_slots. If the active state is returned as false, then the slot is not currently used for replication.
You can also use view pg_stat_replication on the source instance to check the statistics for the replication. For more information, see the PostgreSQL documentation for pg_stat_replication.
Check for bloat or improper removal of dead rows (tuples)
In normal PostgreSQL operations, tuples that are deleted or made obsolete by an UPDATE aren’t removed from their table. For Multi-Version Concurrency Control (MVCC) implementations, when a DELETE operation is performed the row isn’t immediately removed from the data file. Instead, the row is marked as deleted by setting the xmax field in a header. Updates mark rows for deletion first, and then carry out an insert operation. This allows concurrency with minimal locking between the different transactions. As a result, different row versions are kept as part of MVCC process.
If dead rows aren’t cleaned up, they can stay in the data files but remain invisible to any transaction, which impacts disk space. If a table has many DELETE and UPDATE operations, then the dead tuples might use a large amount of disk space that’s sometimes called «bloat» in PostgreSQL.
The VACUUM operation can free the storage used by dead tuples so that it can be reused, but this doesn’t release the free storage to the filesystem. Running VACUUM FULL releases the storage to the filesystem. Note, however, that during the time of the VACUUM FULL run an access exclusive lock is held on the table. This method also requires extra disk space because it writes a new copy of the table and doesn’t release the old copy until the operation is complete. It’s a best practice to use this method only when you must reclaim a significant amount of space from within the table. It’s also a best practice to perform periodic vacuum or autovacuum operations on tables that are updated frequently. For more information, see the PostgreSQL documentation for VACUUM.
To check for the estimated number of dead tuples, use the pg_stat_all_tables view. For more information, see the PostgreSQL documentation for pg_stat_all_tables view. In this example, there are 1999952 dead tuples (n_dead_tup):
Check for orphaned files
Orphaned files can occur when the files are present in the database directory but there are no objects that point to those files. This might happen if your instance runs out of storage or the engine crashes during an operation such as ALTER TABLE, VACUUM FULL, or CLUSTER. To check for orphaned files, follow these steps:
1. Log in to PostgreSQL in each database.
2. Run these queries to assess the used and real sizes.
3. Note the results. If the difference is significant, then orphaned files might be using storage space.
What causes the JNI error «use of deleted local reference»?
I have an Android app where the following C method is called when the app starts (in Activity.onCreate ).
When this method is called the app crashes and I get the error:
JNI DETECTED ERROR IN APPLICATION: use of deleted local reference 0xd280e8d5
Step debugging shows that this line causes the crash:
What causes this error? And how can I call System.getProperty(«os.name») using JNI without getting this error?
2 Answers 2
The issue is that env->CallStaticObjectMethod is expecting a jstring as its 3rd argument and is instead being supplied with a string literal.
Creating a jstring first
fixed the problem.
In my case, I was using a local reference that was created in a function and was used in another function. For example:
The above statement returns a local reference of jclass. Suppose we store this in a global variable and use it in another function funB after funA is completed, then this local reference will not be considered as valid and will return «use of deleted local reference error».
To resolve this we need to do this:
First, get the local reference and then get the global reference from local reference. globalClass can be used globally (in different functions).
Hot Network Questions
Subscribe to RSS
To subscribe to this RSS feed, copy and paste this URL into your RSS reader.
Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.1.12.43152
Troubleshoot instance launch issues
The following issues prevent you from launching an instance.
Invalid device name
You get the Invalid device name device_name error when you try to launch a new instance.
If you get this error when you try to launch an instance, the device name specified for one or more volumes in the request has an invalid device name. Possible causes include:
The device name might be in use by the selected AMI.
The device name might be reserved for root volumes.
The device name might be used for another volume in the request.
The device name might not be valid for the operating system.
To resolve the issue:
Ensure that the device name is not used in the AMI that you selected. Run the following command to view the device names used by the AMI.
Ensure that you are not using a device name that is reserved for root volumes. For more information, see Available device names.
Ensure that each volume specified in your request has a unique device name.
Ensure that the device names that you specified are in the correct format. For more information, see Available device names.
Instance limit exceeded
You get the InstanceLimitExceeded error when you try to launch a new instance or restart a stopped instance.
If you get an InstanceLimitExceeded error when you try to launch a new instance or restart a stopped instance, you have reached the limit on the number of instances that you can launch in a Region. When you create your AWS account, we set default limits on the number of instances you can run on a per-Region basis.
You can request an instance limit increase on a per-region basis. For more information, see Amazon EC2 service quotas.
Insufficient instance capacity
You get the InsufficientInstanceCapacity error when you try to launch a new instance or restart a stopped instance.
If you get this error when you try to launch an instance or restart a stopped instance, AWS does not currently have enough available On-Demand capacity to fulfill your request.
To resolve the issue, try the following:
Wait a few minutes and then submit your request again; capacity can shift frequently.
Submit a new request with a reduced number of instances. For example, if you’re making a single request to launch 15 instances, try making 3 requests for 5 instances, or 15 requests for 1 instance instead.
If you’re launching an instance, submit a new request without specifying an Availability Zone.
If you’re launching an instance, submit a new request using a different instance type (which you can resize at a later stage). For more information, see Change the instance type.
If you are launching instances into a cluster placement group, you can get an insufficient capacity error. For more information, see Working with placement groups.
The requested configuration is currently not supported. Please check the documentation for supported configurations.
You get the Unsupported error when you try to launch a new instance because the instance configuration is not supported.
The error message provides additional details. For example, an instance type or instance purchasing option might not be supported in the specified Region or Availability Zone.
Try a different instance configuration. To search for an instance type that meets your requirements, see Find an Amazon EC2 instance type.
Instance terminates immediately
Your instance goes from the pending state to the terminated state.
The following are a few reasons why an instance might immediately terminate:
You’ve exceeded your EBS volume limits. For more information, see Instance volume limits.
An EBS snapshot is corrupted.
The root EBS volume is encrypted and you do not have permissions to access the KMS key for decryption.
A snapshot specified in the block device mapping for the AMI is encrypted and you do not have permissions to access the KMS key for decryption or you do not have access to the KMS key to encrypt the restored volumes.
The instance store-backed AMI that you used to launch the instance is missing a required part (an image.part.xx file).
For more information, get the termination reason using one of the following methods.
To get the termination reason using the Amazon EC2 console
In the navigation pane, choose Instances, and select the instance.
On the first tab, find the reason next to State transition reason.
To get the termination reason using the AWS Command Line Interface
Use the describe-instances command and specify the instance ID.
Review the JSON response returned by the command and note the values in the StateReason response element.
The following code block shows an example of a StateReason response element.
To get the termination reason using AWS CloudTrail
For more information, see Viewing events with CloudTrail event history in the AWS CloudTrail User Guide.
Depending on the termination reason, take one of the following actions: