Mon, 15 Jun 2009

Copying files with netcat

When you want to copy files from one machine to another, you might think about using scp to copy them. You might think about using rsync. If, however, you're trying to copy a large amount of data between two machines, here's a better, quicker, way to do it is using netcat.

On the receiving machine, run:

# cd /dest/dir && nc -l -p 12345 | tar -xf -

On the sending machine you can now run:

# cd /src/dr && tar -xf - | nc -q 0 remote-server 12345

You should find that everything works nicely, and a lot quicker. If bandwidth is more constrained than CPU, then you can add "z" or "j" to the tar options ("tar -xzf -" etc) to compress the data before it sends it over the network. If you're on gigabit, I wouldn't bother with the compression. If it dies, you'll have to start from the beginning, but then you might find you can get away with using rsync if you've copied enough. It's also worth pointing out that the recieving netcat will die as soon as the connection closes, so you'll need to restart it if you want to copy the data again using this method.

It's worth pointing out that this does not have the security that scp or rsync-over-ssh has, so make sure you trust the end points and everything in between if you don't want anyone else to see the data.

Why not use scp? because it's incredibly slow in comparison. God knows what scp is doing, but it doesn't copy data at wire speed. It aint the encyption and decryption, because that'd just use CPU and when I've done it it's hasn't been CPU bound. I can only assume that the scp process has a lot of handshaking and ssh protocol overhead.

Why not rsync? Rsync doesn't really buy you that much on the first copy. It's only the subsequent runs where rsync really shines. However, rsync requires the source to send a complete file list to the destination before it starts copying any data. If you've got a filesystem with a large number of files, that's an awfully large overhead, especially as the destination host has to hold it in memory.

[linux, netcat, rsync,scp] | # Read Comments (18) |

Comments

Wed, 10 Jun 2009

Table sizes in PostgreSQL

Ever wanted to find out how much diskspace each table was taking in a database? Here's how:

database=# SELECT 
   tablename, 
   pg_size_pretty(pg_relation_size(tablename)) AS table_size, 
   pg_size_pretty(pg_total_relation_size(tablename)) AS total_table_size 
FROM 
   pg_tables 
WHERE 
   schemaname = 'public';
 tablename  | table_size | total_table_size 
------------+------------+------------------
 deferrals  | 205 MB     | 486 MB
 errors     | 58 MB      | 137 MB
 deliveries | 2646 MB    | 10096 MB
 queue      | 7464 kB    | 22 MB
 unknown    | 797 MB     | 2644 MB
 messages   | 1933 MB    | 6100 MB
 rejects    | 25 GB      | 75 GB
(7 rows)

Table size is the size for the current data. Total table size includes indexes and data that is too large to fix in the main table store (things like large BLOB fields). You can find more information in the PostgreSQL manual.

Edit: changed to use pg_size_pretty(), which I thought existed, but couldn't find in the docs. Brett Parker reminded me it did exist after all and I wasn't just imagining it.

[PostgreSQL] | # Read Comments (1) |

Comments