View previous topic :: View next topic |
Author |
Message |
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 5:07 am Post subject: sort [solved] |
|
|
sort is part of coreutils 8.
at least on my system. i've been using sort for 2 decades. yet recently i came across a file that sort just died trying to sort.
anyway, try download the list from https://dale.ro/sort.xz. xz -d -v sort.xz. and just sort sort > sort1.
URL corrected. — JRG
this behaviour new? coz sort... did work like this the last time i checked. i was amazed at what this does.
sort starts to just write stuff on /tmp. and on my system it runs out of 15 gigs of tmpfs. i tried with an actual hardisk as tmp ... but what is the point. i wrote on the fly 2 php files to count what i wanted counting.
when did sort command on linux get this retarded?
Last edited by axl on Fri Jul 20, 2018 10:00 pm; edited 1 time in total |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9855 Location: almost Mile High in the USA
|
Posted: Fri Jul 20, 2018 7:19 am Post subject: |
|
|
Using sys-apps/coreutils-8.28-r1 :
mikuru /tmp $ time sort sort > sort1
real 0m27.350s
user 1m0.306s
sys 0m0.489s
mikuru /tmp $ ls -l sort sort1
-rw-r--r-- 1 sorter sorter 261579449 Jul 20 00:00 sort
-rw-r--r-- 1 sorter sorter 261579449 Jul 20 00:02 sort1
This is on an 8GB machine, 4GB tmpfs. Didn't seem to get the same behavior...?
Which kernel and architecture are you using? _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Fri Jul 20, 2018 7:48 am Post subject: |
|
|
axl: this could be a locale problem. Please try:
Code: | LC_ALL=C time sort sort > sort1
|
|
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 3:03 pm Post subject: |
|
|
eccerr0r wrote: | Using sys-apps/coreutils-8.28-r1 :
mikuru /tmp $ time sort sort > sort1
real 0m27.350s
user 1m0.306s
sys 0m0.489s
mikuru /tmp $ ls -l sort sort1
-rw-r--r-- 1 sorter sorter 261579449 Jul 20 00:00 sort
-rw-r--r-- 1 sorter sorter 261579449 Jul 20 00:02 sort1
This is on an 8GB machine, 4GB tmpfs. Didn't seem to get the same behavior...?
Which kernel and architecture are you using? |
4.17.8. x64. coreutils 8.30. |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 3:06 pm Post subject: |
|
|
mike155 wrote: | axl: this could be a locale problem. Please try:
Code: | LC_ALL=C time sort sort > sort1
|
|
same result. takes up 1-2 minutes until tmpfs /tmp is full then complains about running out of space, deletes files from tmp and quits.
what is strange is that it takes up a ton of memory (top says its > 30%) and on top of that is writing on disk as well.
equery k sys-apps/coreutils-8.30
* Checking sys-apps/coreutils-8.30 ...
255 out of 255 files passed |
|
Back to top |
|
|
John R. Graham Administrator
Joined: 08 Mar 2005 Posts: 10689 Location: Somewhere over Atlanta, Georgia
|
Posted: Fri Jul 20, 2018 3:12 pm Post subject: |
|
|
Make some swap: problem solved. It's wholly expected that sort will use temporary disk space—and lots of RAM—for large files; it's not a bug in any way.
- John _________________ I can confirm that I have received between 0 and 499 National Security Letters. |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 3:16 pm Post subject: |
|
|
16 gigs of ram for 250mb file to sort? |
|
Back to top |
|
|
John R. Graham Administrator
Joined: 08 Mar 2005 Posts: 10689 Location: Somewhere over Atlanta, Georgia
|
Posted: Fri Jul 20, 2018 5:49 pm Post subject: |
|
|
Hmm. Perhaps not. Did you take down your sample file? I'm not able to download it (but that may be my corporate firewall butting in). Not reproducing your results with a concocted random file: less than 3seconds to sort.
- John _________________ I can confirm that I have received between 0 and 499 National Security Letters. |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 5:52 pm Post subject: |
|
|
John R. Graham wrote: | Hmm. Perhaps not. Did you take down your sample file? I'm not able to download it (but that may be my corporate firewall butting in). Not reproducing your results with a concocted random file.
- John |
still up and visible from the internet. checked with http://validator.w3.org/check?uri=dale.ro/sort.xz |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Fri Jul 20, 2018 5:59 pm Post subject: |
|
|
axl: the link you gave in your first post doesn't work. It works, if you manually remove the dot at the end.
Sort works like a charm on my machine. It takes 5 seconds to sort your file. No temp files, everything is done in memory.
Code: | > /usr/bin/time -v sort sort > sort1
Command being timed: "sort sort"
User time (seconds): 4.57
System time (seconds): 0.25
Percent of CPU this job got: 263%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.83
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2048000
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1568
Voluntary context switches: 407
Involuntary context switches: 289
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
> free
total used free shared buff/cache available
Mem: 16323692 843904 13130536 910536 2349252 14118436
Swap: 0 0 0
|
|
|
Back to top |
|
|
John R. Graham Administrator
Joined: 08 Mar 2005 Posts: 10689 Location: Somewhere over Atlanta, Georgia
|
Posted: Fri Jul 20, 2018 6:11 pm Post subject: |
|
|
mike155 wrote: | The link you gave in your first post doesn't work. It works, if you manually remove the dot at the end. | Ahh, good catch. Not reproducing axl's issue here either, although his data does take longer to sort than my concocted data 9-ish seconds vs. 3-ish seconds.
- John _________________ I can confirm that I have received between 0 and 499 National Security Letters. |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Fri Jul 20, 2018 6:25 pm Post subject: |
|
|
It's easy to reproduce axl's issue if you try:
Code: | /usr/bin/time -v sort -S 10 sort > sort1
|
@axl: if your machine has plenty of RAM, sort will sort your 260MB file in memory. If your machine has only a few kB of RAM or if you limit sort's buffer size using option -S, sort will start to create zillions of temporary files in /tmp - and it will take a long time. How much RAM is available on your machine? Please post the output of
|
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 6:42 pm Post subject: |
|
|
mike155 wrote: | It's easy to reproduce axl's issue if you try:
Code: | /usr/bin/time -v sort -S 10 sort > sort1
|
@axl: if your machine has plenty of RAM, sort will sort your 260MB file in memory. If your machine has only a few kB of RAM or if you limit sort's buffer size using option -S, sort will start to create zillions of temporary files in /tmp - and it will take a long time. How much RAM is available on your machine? Please post the output of
|
Hmm, you are right.
time sort -S 10M sort > sort1
real 0m15.923s
user 0m14.634s
sys 0m1.201s
Code: | cat /proc/meminfo
MemTotal: 30829708 kB
MemFree: 526752 kB
MemAvailable: 19217292 kB
Buffers: 1060 kB
Cached: 20185940 kB
SwapCached: 652 kB
Active: 8974356 kB
Inactive: 20237740 kB
Active(anon): 7773144 kB
Inactive(anon): 2778600 kB
Active(file): 1201212 kB
Inactive(file): 17459140 kB
Unevictable: 2120 kB
Mlocked: 2120 kB
SwapTotal: 16777212 kB
SwapFree: 16704764 kB
Dirty: 260 kB
Writeback: 0 kB
AnonPages: 8977668 kB
Mapped: 713236 kB
Shmem: 1526648 kB
Slab: 657136 kB
SReclaimable: 479408 kB
SUnreclaim: 177728 kB
KernelStack: 17360 kB
PageTables: 81032 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 32192064 kB
Committed_AS: 24138676 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
AnonHugePages: 4499456 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 1475392 kB
DirectMap2M: 29980672 kB
|
|
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 6:50 pm Post subject: |
|
|
actually with various sizes it works as its supposed. quickly and without errors. tried 10k - took 21secs. 10M took 13 secs. even 1G worked, though it took 33 secs.
but if i don't specify size manually, it just runs out of ram (which is ample) then runs out of space in /tmp. |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 7:27 pm Post subject: |
|
|
I'll look through the code. Maybe I can dig something up. |
|
Back to top |
|
|
krinn Watchman
Joined: 02 May 2003 Posts: 7470
|
Posted: Fri Jul 20, 2018 7:36 pm Post subject: |
|
|
mike155 wrote: | That's weird. |
Really not weird when you see that
Code: | MemTotal: 30829708 kB
MemFree: 526752 kB |
Your system is eating memory like cake |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 8:05 pm Post subject: |
|
|
Code: | #ifdef RLIMIT_RSS
/* Leave a 1/16 margin for RSS to leave room for code, stack, etc.
Exceeding RSS is not fatal, but can be quite slow. */
if (getrlimit (RLIMIT_RSS, &rlimit) == 0 && rlimit.rlim_cur / 16 * 15 < size)
size = rlimit.rlim_cur / 16 * 15;
#endif |
this is the bit that screws up on that system. that bit makes the equivalent of -S 0b. I have no idea why. Few lines up there's another ifdef RLIMIT_AS. that one apparently is not defined. |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 8:10 pm Post subject: |
|
|
krinn wrote: | mike155 wrote: | That's weird. |
Really not weird when you see that
Code: | MemTotal: 30829708 kB
MemFree: 526752 kB |
Your system is eating memory like cake |
16 of those 30 gigs are allocated to some VMs. I have X on, gdm on, gnome3, multi-tab chrome on, wine with a game, evolution and a few terminals. and still 19 Gigs of it is available. |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Fri Jul 20, 2018 9:36 pm Post subject: |
|
|
[text deleted, since it was wrong and axl's problem was caused by something else (see below)]
Last edited by mike155 on Fri Jul 20, 2018 10:35 pm; edited 1 time in total |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 10:00 pm Post subject: |
|
|
Again, this machine has a very high consumption of memory as it is. about half of it is usually gone by the time i even get to log in. on top of that i run a desktop on it, so it caches a lot. the memory is available. if i try to use it it's there.
I figured out the culprit. it was /etc/security/limits.conf, specifically this line:
excerpt from sort.c (starting at line 1457):
Code: |
static size_t
default_sort_size (void)
{
/* Let SIZE be MEM, but no more than the maximum object size,
total memory, or system resource limits. Don't bother to check
for values like RLIM_INFINITY since in practice they are not much
less than SIZE_MAX. */
size_t size = SIZE_MAX;
printf("size 1 here: %lu\n", size);
struct rlimit rlimit;
if (getrlimit (RLIMIT_DATA, &rlimit) == 0 && rlimit.rlim_cur < size)
{
size = rlimit.rlim_cur;
printf("size 2 here: %lu\n", size);
}
#ifdef RLIMIT_AS
if (getrlimit (RLIMIT_AS, &rlimit) == 0 && rlimit.rlim_cur < size)
{
size = rlimit.rlim_cur;
printf("size 3 here: %lu\n", size);
}
#endif
/* Leave a large safety margin for the above limits, as failure can
occur when they are exceeded. */
size /= 2;
printf("size 4 here: %lu\n", size);
#ifdef RLIMIT_RSS
/* Leave a 1/16 margin for RSS to leave room for code, stack, etc.
Exceeding RSS is not fatal, but can be quite slow. */
if (getrlimit (RLIMIT_RSS, &rlimit) == 0 && rlimit.rlim_cur / 16 * 15 < size)
{
size = rlimit.rlim_cur;// / 16 * 15;
printf("size 5 here: %lu\n", size);
}
#endif
/* Let MEM be available memory or 1/8 of total memory, whichever
is greater. */
double avail = physmem_available ();
printf("avail: %lf\n", avail);
double total = physmem_total ();
printf("total: %lf\n", total);
double mem = MAX (avail, total / 8);
printf("mem: %lf\n", mem);
/* Leave a 1/4 margin for physical memory. */
if (total * 0.75 < size)
{
size = total * 0.75;
printf("size 6 here: %lu\n", size);
}
/* Return the minimum of MEM and SIZE, but no less than
MIN_SORT_SIZE. Avoid the MIN macro here, as it is not quite
right when only one argument is floating point. */
if (mem < size)
{
size = mem;
printf("size 7 here: %lu\n", size);
}
return MAX (size, MIN_SORT_SIZE);
}
|
when the line in limits.conf is commented, sort executes like this:
size 1 here: 18446744073709551615
size 4 here: 9223372036854775807
avail: 24452755456.000000
total: 31569620992.000000
mem: 24452755456.000000
size 6 here: 23677215744
when line in limits.conf is not commented, sort executes like this:
size 1 here: 18446744073709551615
size 4 here: 9223372036854775807
size 5 here: 0
avail: 6833000448.000000
total: 8353812480.000000
mem: 6833000448.000000
the sequence is different. 145 and 146. that is because when RLIMIT_RSS is zero, then size become zero and it's the equivalent of running sort with argument --size=0 and that explains why it cant sort in memory or on disk. I'll mark as solved as i figured what caused and i doubt anyone else will stumble into it. |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6069 Location: Removed by Neddy
|
Posted: Fri Jul 20, 2018 10:04 pm Post subject: |
|
|
back to your original post...
Quote: | when did sort command on linux get this retarded? |
it didn't _________________ #define HelloWorld int
#define Int main()
#define Return printf
#define Print return
#include <stdio>
HelloWorld Int {
Return("Hello, world!\n");
Print 0; |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Jul 20, 2018 10:07 pm Post subject: |
|
|
Naib wrote: | back to your original post...
Quote: | when did sort command on linux get this retarded? |
it didn't |
kinda is, because it shouldn't run with size zero, and it should know that. there's 2 extra line of code that could have prevented this. or if you want to say that i am retarded then fine, i am. happy? |
|
Back to top |
|
|
Akkara Bodhisattva
Joined: 28 Mar 2006 Posts: 6702 Location: &akkara
|
Posted: Sat Jul 21, 2018 11:50 am Post subject: |
|
|
I tried your sorting test on a Raspberry-3, just to see what would happen. There's 1GB of RAM (MemTotal: 880552 kB), and a 256 MB tmpfs mounted on /tmp. It works but needs compressed temporaries: Code: | xzcat sort.xz | sort --compress-program=gzip | uniq -c | Takes 112 seconds for the sort part, and 125 seconds overall.
Since you have a lot of stuff using up memory, maybe this option can help.
I noticed from the output of uniq that the file you are sorting is *highly* redundant. If this is typical, you can do better with an AWK-based solution that builds an associative array indexed by lines and counts occurrences. I'm no AWK expert, but something along the lines of this might help: Code: | xzcat sort.xz | awk '/./ { if(++count[$0] == 1) lines[n++] = $0; } END { for(i=0; i<n; ++i) printf("%d\t%s\n", count[lines[i]], lines[i]); }' |
... which runs in 15 seconds on the same Raspberry. But note that this version uniquifies and counts the occurrences, but leaves the lines unsorted. Pipe it thru a 'sort -k 2' to replicate the output of ... | sort | uniq -c. This last sort has almost no work to do since all the duplicates have been collapsed into one, so runs fast. _________________ Many think that Dilbert is a comic. Unfortunately it is a documentary. |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Sat Jul 21, 2018 10:59 pm Post subject: |
|
|
thanks for all the kind suggestions folks. and testing on raspberry pi. thank you very kind. |
|
Back to top |
|
|
|