Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
sh - function to pass data stdin -> stdout unchanged
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3746
Location: Rasi, Finland

PostPosted: Tue Nov 26, 2024 8:48 pm    Post subject: sh - function to pass data stdin -> stdout unchanged Reply with quote

I have this function, which takes tsv formatted stdin:
Code:
tsv_column() {
        if [ -t 1 ]
        then
                # Columnize tsv formatted input
                column -t -s $'\t'
        else
                # We're in pipe. Output raw tsv.
                # Useless use of cat?
                cat
        fi
}
Is there a way to avoid cat, but to use some shell redirection for example?
I was thinking about exec, but the previous state of redirections need to be restored after function returns (so maybe invoke subshell?). I'm really not sure which way works and is the "right" way.

I'm looking for a solution that works in busybox shell and in bash. It surely doesn't hurt to have it working in dash too for example. ;)
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!


Last edited by Zucca on Tue Nov 26, 2024 9:32 pm; edited 1 time in total
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3461

PostPosted: Tue Nov 26, 2024 9:01 pm    Post subject: Reply with quote

Can you pass the second command either as a parameter to your function or though env, and then call it from within your function either with pipe or directly, depending on which branch you hit?

Actually, do you even need a function for just a single test with one-line paths?
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1289
Location: Richmond Hill, Canada

PostPosted: Tue Nov 26, 2024 9:10 pm    Post subject: Reply with quote

Couldn't you use read and echo to replace cat?

like while read line; do
echo $line
done
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3746
Location: Rasi, Finland

PostPosted: Tue Nov 26, 2024 9:23 pm    Post subject: Reply with quote

@pingtoo, I think cat performs better than echo inside a loop.

@szatox, I didn't quite follow. The point of this function it to recognize if the user ran the script inside a pipeline, so it won't try to columnize the output, but keep the original tsv formatting for easier parsing.
EDIT: My wording in the first post was maybe misleading. EDIT2: Edited the first post.
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!


Last edited by Zucca on Tue Nov 26, 2024 9:34 pm; edited 2 times in total
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22727

PostPosted: Tue Nov 26, 2024 9:25 pm    Post subject: Reply with quote

pingtoo wrote:
Couldn't you use read and echo to replace cat?

like while read line; do
echo $line
done
That would be a poor reimplementation of cat, and likely less efficient too. I think the goal is for the unidentified producer to be directly connected to the unidentified consumer. I'm not aware of a way to do what OP wants. I think szatox is on the right track, by avoiding even creating the producer's stdout connected to a pipe that the shell later needs to proxy to the consumer.

Zucca, could you show example uses of the interactive and non-interactive flow?
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1289
Location: Richmond Hill, Canada

PostPosted: Tue Nov 26, 2024 9:29 pm    Post subject: Reply with quote

Zucca wrote:
@pingtoo, I think cat performs better than echo inside a loop.
I have not tested myself so maybe my thinking is wrong.

But what if change IFS= *(as in nothing). Will the read got entire stdout from pipe output? so it is effective same as cat?
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3746
Location: Rasi, Finland

PostPosted: Tue Nov 26, 2024 9:47 pm    Post subject: Reply with quote

Hu wrote:
Zucca, could you show example uses of the interactive and non-interactive flow?
Code:
command_which_produces_tsv_data | tsv_column
... here tsv_column() would format the tsv data to more human readable form by columnizing the data.
Code:
command_which_produces_tsv_data | tsv_column | some_other_command
... here tsv_column() would pass the tsv data as-is so that some_other_command can easily parse it.

The point is that tsv_column() lives inside a script, which would normally output neatly columnized data, but if the user decides to pipe the stdout of that script into somewhere, then the data formatting would be preserved for later processing.

I intend to add some way to force either of the formats, but that's really not the question here.

Can I have a shell function pass the data from stdin to stdout without cat (not using any command, but some clever redirection maybe)? I was thinking of
Code:
0>&1
, but that doesn't work.
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3461

PostPosted: Tue Nov 26, 2024 10:21 pm    Post subject: Reply with quote

Ah, ok, had a brain fart here. After a second read I get what what you want.
Quote:
I was thinking about exec, but the previous state of redirections need to be restored after function returns (so maybe invoke subshell?)

First, if you want it to connect, you need an outlet and a plug. You can't just rub 2 outlets together and expect the power to flow, if you know what I mean, it's like, a part with a hole needs a part with a peg for things to click :lol:
Sure, go ahead and prove me wrong, but I say cleanup won't be a problem because redirecting 0 to 1 with "exec" won't even "cat" it.

Second, subshell wouldn't be any better than cat, either of those spawns another process and does all the same magic, so keep it simple. I really doubt you can get anything better than what you started with. You do actually need that cat to serve as a wire connecting your 2 file descriptors
Unless you can cheat your way out of it entirely. But as far as we know, the data comes from stdin.If you want to cut the pipeline short, you'd basically have to do the logic with figuring out whether or not you're in a pipeline outside of your script, and invoke either columns or whatever program consumes the data stream instead of cat.

I've been thinking about something like this:
Code:
$ f () {
if [ -n $1 ]
command"$1"
shift
exec "$command" "$@"
else
columns
fi
}
$ f optional_data_consumer possibly with options

_________________
Make Computing Fun Again


Last edited by szatox on Tue Nov 26, 2024 10:42 pm; edited 3 times in total
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1289
Location: Richmond Hill, Canada

PostPosted: Tue Nov 26, 2024 10:35 pm    Post subject: Reply with quote

I would bet my version of using "read" and "echo" together with 'IFS=' is just efficient as cat in a pipe line. May be the while loop can be remove because the "read" and "echo" happen in middle of pipe line anyway.

if I had Dtrace setup than I can trace for syscall in that context compare to "cat" in the middle of pipe.

I am thinking my version have one less cost for invoke a external program. since "read" and "echo" are universal in every shell.

However I admit if the invoke logic can be change to detect pipe condition prior to invoke tsv_column that "cat" is more efficient than read/echo loop.
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3746
Location: Rasi, Finland

PostPosted: Tue Nov 26, 2024 10:54 pm    Post subject: Reply with quote

pingtoo wrote:
I would bet my version of using "read" and "echo" together with 'IFS=' is just efficient as cat in a pipe line.
Code:
zucca@NBLK-WAX9X ~ $ tr -dc 'a-z\t[:space:]A-Z0-9\n' < /dev/urandom | head -c $((1024*1024*512)) > /tmp/test.file
zucca@NBLK-WAX9X ~ $ time cat /tmp/test.file > /dev/null

real   0m0.127s
user   0m0.003s
sys   0m0.122s
zucca@NBLK-WAX9X ~ $ time while IFS= read l; do echo $l; done < /tmp/test.file > /dev/null
^C
real   2m36.665s
user   2m24.975s
sys   0m8.615s
@pingtoo, as you see I gave up after 2m 30s mark. ;)
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3746
Location: Rasi, Finland

PostPosted: Tue Nov 26, 2024 10:57 pm    Post subject: Reply with quote

szatox wrote:
First, if you want it to connect, you need an outlet and a plug. You can't just rub 2 outlets together and expect the power to flow, if you know what I mean, it's like, a part with a hole needs a part with a peg for things to click :lol:
Yes, that's my problem. It's not terribly inefficient of using cat but...
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
eschwartz
Developer
Developer


Joined: 29 Oct 2023
Posts: 238

PostPosted: Tue Nov 26, 2024 11:56 pm    Post subject: Reply with quote

"cat" is an unbelievably efficient program that is extremely good at doing what it does. People have written blog posts about how they tried to rewrite cat in Rust for the fearless concurrency and blazing fast speed, and were shocked to discover that rust was in fact extremely slow and it's because they incorrectly thought all you need to do is "just read data from one end and print it on the other end". It turns out that smart programmers who have spent decades optimizing a program have a few tricks up their sleeves and it has nothing to do with programming languages (yes, bash is a programming language, and bash does NOT provide primitives for the functionality the GNU cat developers have relied on here).

You will also find that "cat" is used all over the place as basically the canonical way to implement a pipeline filter that sometimes does something, but in the "cat" case, does nothing. There is a reason for this. The only faster thing to do would be to complexify a different part of your program by introducing branching that determines whether or not to invoke a pipeline filter.
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1289
Location: Richmond Hill, Canada

PostPosted: Wed Nov 27, 2024 1:11 am    Post subject: Reply with quote

Zucca wrote:
pingtoo wrote:
I would bet my version of using "read" and "echo" together with 'IFS=' is just efficient as cat in a pipe line.
Code:
zucca@NBLK-WAX9X ~ $ tr -dc 'a-z\t[:space:]A-Z0-9\n' < /dev/urandom | head -c $((1024*1024*512)) > /tmp/test.file
zucca@NBLK-WAX9X ~ $ time cat /tmp/test.file > /dev/null

real   0m0.127s
user   0m0.003s
sys   0m0.122s
zucca@NBLK-WAX9X ~ $ time while IFS= read l; do echo $l; done < /tmp/test.file > /dev/null
^C
real   2m36.665s
user   2m24.975s
sys   0m8.615s
@pingtoo, as you see I gave up after 2m 30s mark. ;)
Thank you. I always over thinking of something. you gave me good answer.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22727

PostPosted: Wed Nov 27, 2024 2:26 am    Post subject: Reply with quote

The shown pipeline puts everything in a single shell, but if the | tsv_column part is buried in a script, and the potentially present | some_other_command isn't, then that's not the same pipeline as shown.

You can play games with exec to save aside and restore stdout, and if you commit to using Bash, you can use process substitution to conditionally insert your final stage. I think if you want non-Bash Bourne, then you will need to follow szatox's suggestion.
Code:
#!/bin/sh

function real_work() {
    # ...
}

if [ -t 1 ]; then
    real_work | column -t -s $'\t'
else
    real_work
fi
Then put everything else inside real_work, so that it is conditionally invoked either with its stdout piped to column or with its stdout preserved.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum