Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
sort [solved]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 5:07 am    Post subject: sort [solved] Reply with quote

sort is part of coreutils 8.

at least on my system. i've been using sort for 2 decades. yet recently i came across a file that sort just died trying to sort.

anyway, try download the list from https://dale.ro/sort.xz. xz -d -v sort.xz. and just sort sort > sort1.
URL corrected. — JRG

this behaviour new? coz sort... did work like this the last time i checked. i was amazed at what this does.

sort starts to just write stuff on /tmp. and on my system it runs out of 15 gigs of tmpfs. i tried with an actual hardisk as tmp ... but what is the point. i wrote on the fly 2 php files to count what i wanted counting.

when did sort command on linux get this retarded?


Last edited by axl on Fri Jul 20, 2018 10:00 pm; edited 1 time in total
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9876
Location: almost Mile High in the USA

PostPosted: Fri Jul 20, 2018 7:19 am    Post subject: Reply with quote

Using sys-apps/coreutils-8.28-r1 :

mikuru /tmp $ time sort sort > sort1
real 0m27.350s
user 1m0.306s
sys 0m0.489s
mikuru /tmp $ ls -l sort sort1
-rw-r--r-- 1 sorter sorter 261579449 Jul 20 00:00 sort
-rw-r--r-- 1 sorter sorter 261579449 Jul 20 00:02 sort1

This is on an 8GB machine, 4GB tmpfs. Didn't seem to get the same behavior...?

Which kernel and architecture are you using?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Fri Jul 20, 2018 7:48 am    Post subject: Reply with quote

axl: this could be a locale problem. Please try:
Code:
LC_ALL=C time sort sort > sort1
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 3:03 pm    Post subject: Reply with quote

eccerr0r wrote:
Using sys-apps/coreutils-8.28-r1 :

mikuru /tmp $ time sort sort > sort1
real 0m27.350s
user 1m0.306s
sys 0m0.489s
mikuru /tmp $ ls -l sort sort1
-rw-r--r-- 1 sorter sorter 261579449 Jul 20 00:00 sort
-rw-r--r-- 1 sorter sorter 261579449 Jul 20 00:02 sort1

This is on an 8GB machine, 4GB tmpfs. Didn't seem to get the same behavior...?

Which kernel and architecture are you using?


4.17.8. x64. coreutils 8.30.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 3:06 pm    Post subject: Reply with quote

mike155 wrote:
axl: this could be a locale problem. Please try:
Code:
LC_ALL=C time sort sort > sort1


same result. takes up 1-2 minutes until tmpfs /tmp is full then complains about running out of space, deletes files from tmp and quits.

what is strange is that it takes up a ton of memory (top says its > 30%) and on top of that is writing on disk as well.

equery k sys-apps/coreutils-8.30
* Checking sys-apps/coreutils-8.30 ...
255 out of 255 files passed
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10716
Location: Somewhere over Atlanta, Georgia

PostPosted: Fri Jul 20, 2018 3:12 pm    Post subject: Reply with quote

Make some swap: problem solved. It's wholly expected that sort will use temporary disk space—and lots of RAM—for large files; it's not a bug in any way.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 3:16 pm    Post subject: Reply with quote

16 gigs of ram for 250mb file to sort?
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10716
Location: Somewhere over Atlanta, Georgia

PostPosted: Fri Jul 20, 2018 5:49 pm    Post subject: Reply with quote

Hmm. Perhaps not. Did you take down your sample file? I'm not able to download it (but that may be my corporate firewall butting in). Not reproducing your results with a concocted random file: less than 3seconds to sort.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 5:52 pm    Post subject: Reply with quote

John R. Graham wrote:
Hmm. Perhaps not. Did you take down your sample file? I'm not able to download it (but that may be my corporate firewall butting in). Not reproducing your results with a concocted random file.

- John


still up and visible from the internet. checked with http://validator.w3.org/check?uri=dale.ro/sort.xz
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Fri Jul 20, 2018 5:59 pm    Post subject: Reply with quote

axl: the link you gave in your first post doesn't work. It works, if you manually remove the dot at the end.

Sort works like a charm on my machine. It takes 5 seconds to sort your file. No temp files, everything is done in memory.
Code:
> /usr/bin/time -v sort sort > sort1
        Command being timed: "sort sort"
        User time (seconds): 4.57
        System time (seconds): 0.25
        Percent of CPU this job got: 263%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.83
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2048000
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 1568
        Voluntary context switches: 407
        Involuntary context switches: 289
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

> free
              total        used        free      shared  buff/cache   available
Mem:       16323692      843904    13130536      910536     2349252    14118436
Swap:             0           0           0
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10716
Location: Somewhere over Atlanta, Georgia

PostPosted: Fri Jul 20, 2018 6:11 pm    Post subject: Reply with quote

mike155 wrote:
The link you gave in your first post doesn't work. It works, if you manually remove the dot at the end.
Ahh, good catch. Not reproducing axl's issue here either, although his data does take longer to sort than my concocted data 9-ish seconds vs. 3-ish seconds.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Fri Jul 20, 2018 6:25 pm    Post subject: Reply with quote

It's easy to reproduce axl's issue if you try:
Code:
/usr/bin/time -v sort -S 10 sort > sort1

@axl: if your machine has plenty of RAM, sort will sort your 260MB file in memory. If your machine has only a few kB of RAM or if you limit sort's buffer size using option -S, sort will start to create zillions of temporary files in /tmp - and it will take a long time. How much RAM is available on your machine? Please post the output of
Code:
cat /proc/meminfo
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 6:42 pm    Post subject: Reply with quote

mike155 wrote:
It's easy to reproduce axl's issue if you try:
Code:
/usr/bin/time -v sort -S 10 sort > sort1

@axl: if your machine has plenty of RAM, sort will sort your 260MB file in memory. If your machine has only a few kB of RAM or if you limit sort's buffer size using option -S, sort will start to create zillions of temporary files in /tmp - and it will take a long time. How much RAM is available on your machine? Please post the output of
Code:
cat /proc/meminfo


Hmm, you are right.

time sort -S 10M sort > sort1

real 0m15.923s
user 0m14.634s
sys 0m1.201s


Code:
cat /proc/meminfo
MemTotal:       30829708 kB
MemFree:          526752 kB
MemAvailable:   19217292 kB
Buffers:            1060 kB
Cached:         20185940 kB
SwapCached:          652 kB
Active:          8974356 kB
Inactive:       20237740 kB
Active(anon):    7773144 kB
Inactive(anon):  2778600 kB
Active(file):    1201212 kB
Inactive(file): 17459140 kB
Unevictable:        2120 kB
Mlocked:            2120 kB
SwapTotal:      16777212 kB
SwapFree:       16704764 kB
Dirty:               260 kB
Writeback:             0 kB
AnonPages:       8977668 kB
Mapped:           713236 kB
Shmem:           1526648 kB
Slab:             657136 kB
SReclaimable:     479408 kB
SUnreclaim:       177728 kB
KernelStack:       17360 kB
PageTables:        81032 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    32192064 kB
Committed_AS:   24138676 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
AnonHugePages:   4499456 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     1475392 kB
DirectMap2M:    29980672 kB
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 6:50 pm    Post subject: Reply with quote

actually with various sizes it works as its supposed. quickly and without errors. tried 10k - took 21secs. 10M took 13 secs. even 1G worked, though it took 33 secs.

but if i don't specify size manually, it just runs out of ram (which is ample) then runs out of space in /tmp.
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Fri Jul 20, 2018 7:20 pm    Post subject: Reply with quote

It seems that the default value for sort's buffer size is too small on your machine. That's weird. Please look here: https://stackoverflow.com/questions/37514283/gnu-sort-default-buffer-size for an explanation.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 7:27 pm    Post subject: Reply with quote

I'll look through the code. Maybe I can dig something up.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Fri Jul 20, 2018 7:36 pm    Post subject: Reply with quote

mike155 wrote:
That's weird.

Really not weird when you see that
Code:
MemTotal:       30829708 kB
MemFree:          526752 kB

Your system is eating memory like cake
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 8:05 pm    Post subject: Reply with quote

Code:
#ifdef RLIMIT_RSS
  /* Leave a 1/16 margin for RSS to leave room for code, stack, etc.
     Exceeding RSS is not fatal, but can be quite slow.  */
  if (getrlimit (RLIMIT_RSS, &rlimit) == 0 && rlimit.rlim_cur / 16 * 15 < size)
    size = rlimit.rlim_cur / 16 * 15;
#endif


this is the bit that screws up on that system. that bit makes the equivalent of -S 0b. I have no idea why. Few lines up there's another ifdef RLIMIT_AS. that one apparently is not defined.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 8:10 pm    Post subject: Reply with quote

krinn wrote:
mike155 wrote:
That's weird.

Really not weird when you see that
Code:
MemTotal:       30829708 kB
MemFree:          526752 kB

Your system is eating memory like cake


16 of those 30 gigs are allocated to some VMs. I have X on, gdm on, gnome3, multi-tab chrome on, wine with a game, evolution and a few terminals. and still 19 Gigs of it is available.
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Fri Jul 20, 2018 9:36 pm    Post subject: Reply with quote

[text deleted, since it was wrong and axl's problem was caused by something else (see below)]

Last edited by mike155 on Fri Jul 20, 2018 10:35 pm; edited 1 time in total
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 10:00 pm    Post subject: Reply with quote

Again, this machine has a very high consumption of memory as it is. about half of it is usually gone by the time i even get to log in. on top of that i run a desktop on it, so it caches a lot. the memory is available. if i try to use it it's there.

I figured out the culprit. it was /etc/security/limits.conf, specifically this line:
Code:
*               hard    rss             0


excerpt from sort.c (starting at line 1457):
Code:

static size_t
default_sort_size (void)
{
  /* Let SIZE be MEM, but no more than the maximum object size,
     total memory, or system resource limits.  Don't bother to check
     for values like RLIM_INFINITY since in practice they are not much
     less than SIZE_MAX.  */
  size_t size = SIZE_MAX;
  printf("size 1 here: %lu\n", size);
  struct rlimit rlimit;
  if (getrlimit (RLIMIT_DATA, &rlimit) == 0 && rlimit.rlim_cur < size)
  {
    size = rlimit.rlim_cur;
    printf("size 2 here: %lu\n", size);
  }
#ifdef RLIMIT_AS
  if (getrlimit (RLIMIT_AS, &rlimit) == 0 && rlimit.rlim_cur < size)
  {
    size = rlimit.rlim_cur;
    printf("size 3 here: %lu\n", size);
  }
#endif

  /* Leave a large safety margin for the above limits, as failure can
     occur when they are exceeded.  */
  size /= 2;
  printf("size 4 here: %lu\n", size);
#ifdef RLIMIT_RSS
  /* Leave a 1/16 margin for RSS to leave room for code, stack, etc.
     Exceeding RSS is not fatal, but can be quite slow.  */
  if (getrlimit (RLIMIT_RSS, &rlimit) == 0 && rlimit.rlim_cur / 16 * 15 < size)
  {
    size = rlimit.rlim_cur;// / 16 * 15;
    printf("size 5 here: %lu\n", size);
  }
#endif

  /* Let MEM be available memory or 1/8 of total memory, whichever
     is greater.  */
  double avail = physmem_available ();
  printf("avail: %lf\n", avail);
  double total = physmem_total ();
  printf("total: %lf\n", total);
  double mem = MAX (avail, total / 8);
  printf("mem: %lf\n", mem);

  /* Leave a 1/4 margin for physical memory.  */
  if (total * 0.75 < size)
  {
    size = total * 0.75;
    printf("size 6 here: %lu\n", size);
  }

  /* Return the minimum of MEM and SIZE, but no less than
     MIN_SORT_SIZE.  Avoid the MIN macro here, as it is not quite
     right when only one argument is floating point.  */
  if (mem < size)
  {
    size = mem;
    printf("size 7 here: %lu\n", size);
  }
  return MAX (size, MIN_SORT_SIZE);
}


when the line in limits.conf is commented, sort executes like this:
size 1 here: 18446744073709551615
size 4 here: 9223372036854775807
avail: 24452755456.000000
total: 31569620992.000000
mem: 24452755456.000000
size 6 here: 23677215744

when line in limits.conf is not commented, sort executes like this:
size 1 here: 18446744073709551615
size 4 here: 9223372036854775807
size 5 here: 0
avail: 6833000448.000000
total: 8353812480.000000
mem: 6833000448.000000


the sequence is different. 145 and 146. that is because when RLIMIT_RSS is zero, then size become zero and it's the equivalent of running sort with argument --size=0 and that explains why it cant sort in memory or on disk. I'll mark as solved as i figured what caused and i doubt anyone else will stumble into it.
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 6069
Location: Removed by Neddy

PostPosted: Fri Jul 20, 2018 10:04 pm    Post subject: Reply with quote

back to your original post...
Quote:
when did sort command on linux get this retarded?


it didn't
_________________
#define HelloWorld int
#define Int main()
#define Return printf
#define Print return
#include <stdio>
HelloWorld Int {
Return("Hello, world!\n");
Print 0;
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Fri Jul 20, 2018 10:07 pm    Post subject: Reply with quote

Naib wrote:
back to your original post...
Quote:
when did sort command on linux get this retarded?


it didn't


kinda is, because it shouldn't run with size zero, and it should know that. there's 2 extra line of code that could have prevented this. or if you want to say that i am retarded then fine, i am. happy?
Back to top
View user's profile Send private message
Akkara
Bodhisattva
Bodhisattva


Joined: 28 Mar 2006
Posts: 6702
Location: &akkara

PostPosted: Sat Jul 21, 2018 11:50 am    Post subject: Reply with quote

I tried your sorting test on a Raspberry-3, just to see what would happen. There's 1GB of RAM (MemTotal: 880552 kB), and a 256 MB tmpfs mounted on /tmp. It works but needs compressed temporaries:
Code:
xzcat sort.xz | sort --compress-program=gzip | uniq -c
Takes 112 seconds for the sort part, and 125 seconds overall.

Since you have a lot of stuff using up memory, maybe this option can help.

I noticed from the output of uniq that the file you are sorting is *highly* redundant. If this is typical, you can do better with an AWK-based solution that builds an associative array indexed by lines and counts occurrences. I'm no AWK expert, but something along the lines of this might help:
Code:
xzcat sort.xz | awk '/./ { if(++count[$0] == 1) lines[n++] = $0; } END { for(i=0; i<n; ++i) printf("%d\t%s\n", count[lines[i]], lines[i]); }'

... which runs in 15 seconds on the same Raspberry. But note that this version uniquifies and counts the occurrences, but leaves the lines unsorted. Pipe it thru a 'sort -k 2' to replicate the output of ... | sort | uniq -c. This last sort has almost no work to do since all the duplicates have been collapsed into one, so runs fast.
_________________
Many think that Dilbert is a comic. Unfortunately it is a documentary.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Sat Jul 21, 2018 10:59 pm    Post subject: Reply with quote

thanks for all the kind suggestions folks. and testing on raspberry pi. thank you :) very kind.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum