Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[Solved] Problems with Seagate D4 / ISCSI
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
therealjrd
Tux's lil' helper
Tux's lil' helper


Joined: 18 May 2006
Posts: 122

PostPosted: Wed Nov 30, 2016 3:34 am    Post subject: [Solved] Problems with Seagate D4 / ISCSI Reply with quote

Hi all.

I acquired a Seagate D4 NAS box. I've configured it as an ISCSI device.

It connects fine, created an ext4 FS on it, superficial tests work great. iozone works great.

But when I try to rsync my main server FS to it, it runs for a while, then complains of being unable to write to a readonly file system. fsck tells me about all sorts of carnage.

Looking for suggestions on how to debug. Thanks in advance . . .


Last edited by therealjrd on Wed Dec 28, 2016 6:00 pm; edited 1 time in total
Back to top
View user's profile Send private message
therealjrd
Tux's lil' helper
Tux's lil' helper


Joined: 18 May 2006
Posts: 122

PostPosted: Mon Dec 05, 2016 1:08 pm    Post subject: Reply with quote

Bump

I've rebuilt open-iscsi with USE debug. I've turned on CRC32 checking for both header and data digests. Those measures seem to help, in that the time-to-failure is longer, but it still fails after a while.

I *think* this is telling me that one or the other NIC is flaky. Anybody out there have similar experience? Ideas on how to debug further before I start throwing hardware at it?
Back to top
View user's profile Send private message
therealjrd
Tux's lil' helper
Tux's lil' helper


Joined: 18 May 2006
Posts: 122

PostPosted: Sun Dec 11, 2016 5:18 pm    Post subject: Reply with quote

I've done a few more experiments on this.

When I mount -o sync, everything works perfectly: I can run rsync for days with zero errors. That's good, because the performance is terrible :( It takes days to sync a TB.

I've also turned on CRC32C checking data and header digests, and told the device to do the same. No discernable difference. So I no longer think I'm looking at NIC problems.

Mounting with different combinations of options seems to make some difference, but not definitive. data=ordered,commit=1,debug,barrier=1 seems to work best, in that it survives longest before starting to detect errors, but I've found no combination which makes it reliable.

Googling a bit for similar setups doesn't turn up much. Does anyone have pointers to other deployments using iscsi to talk to one of these Seagate devices?

Another option, of course, is to stop trying to use the seagate device as a block device, and turn on its internal NFS server.

Any hints appreciate.
Back to top
View user's profile Send private message
therealjrd
Tux's lil' helper
Tux's lil' helper


Joined: 18 May 2006
Posts: 122

PostPosted: Wed Dec 28, 2016 5:59 pm    Post subject: Reply with quote

Well, ok, FTR, I've sort of figured this out.

It seems that some devices, including the Seagate unit I have, do not do a good job of flow control. It's possible for the initiator to overload them, after which they seem to garble or drop requests. With a file system, that manifests as FS corruption.

Modern kernels have a workaround, but you have to know where to look for it. There's a long thread about this topic here: https://bugzilla.kernel.org/show_bug.cgi?id=93581

What I did to "fix" this:

1. Use parted to partition the device. This allows for optimal sector alignment, and more efficient IO. I followed the instructions here: https://wiki.gentoo.org/wiki/Handbook:AMD64/Installation/Disks#Default:_Using_parted_to_partition_the_disk

2. Per the above bug report, set the value of /sys/block/sdd/queue/max_sectors_kb much lower. 4.4.26 kernel is 32K. I'm still trying different values, but it looks like 256 is enough to keep the device maxed out.

3. Use mount options -o commit=1,barrier=1,block_validity It's not clear that these make much difference; I started using combinations of them before I discovered max_sectors_kb. Leaving them on does seem to smooth out the IO performance, as measured by the self-monitoring software in the seagate.

By taking these measures, I've been able to write many TB onto the device with no corruption and no errors in the syslog other than an occasional disconnect/reconnect. I plan to do some more experiments, run some iozone tests etc, before I start trusting this device with real data.

In case anybody's trying to interface to an iscsi device (or other block device) and seeing wierd problems, I recommend reading over that bug report and reducing the value of max_sectors_kb.
Back to top
View user's profile Send private message
gwong
n00b
n00b


Joined: 01 Jan 2017
Posts: 1

PostPosted: Sun Jan 01, 2017 6:20 am    Post subject: Reply with quote

Had a very similar issue. After reading this, now it is solved. Thanks.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum