Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
read a file from the end to the beginning
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
SarahS93
l33t
l33t


Joined: 21 Nov 2013
Posts: 712

PostPosted: Fri Feb 23, 2024 3:41 am    Post subject: read a file from the end to the beginning Reply with quote

how to read a file from the end to the beginning?

"tac" can do this?
but if i read a file with tac and write it to a new one, and read the new one again with tac, the file is not the same as by the first reading.

Code:
dd if=/dev/urandom of=test1gb bs=1G count=1
1+0 Datensätze ein
1+0 Datensätze aus
1073741824 Bytes (1,1 GB, 1,0 GiB) kopiert, 2,26988 s, 473 MB/s

tac test1gb   > test1gb_1
tac test1gb_1 > test1gb_2

md5sum test1gb*
09118633eeb78190f81dac3d59ca9b5a  test1gb
0c27771a14ab8758d6b6faa378adbd5d  test1gb_1
c266a73814411dd824b3416595707896  test1gb_2


is there an other programm for read a file from the end to the beginning?
Back to top
View user's profile Send private message
freke
Veteran
Veteran


Joined: 23 Jan 2003
Posts: 1006
Location: Somewhere in Denmark

PostPosted: Fri Feb 23, 2024 9:48 am    Post subject: Reply with quote

I suspect this has to do with the type of file - works fine for ASCII-text files.

Can be done with binary files with
Code:
ns ~ # < test1gb xxd -p -c1 | tac | xxd -p -r > test1gb_1
ns ~ # < test1gb_1 xxd -p -c1 | tac | xxd -p -r > test1gb_2
ns ~ # md5sum test1gb*
06d8ec759353542e1a9ac33c0018a302  test1gb
3202e50b76d5d33506945670f12dd760  test1gb_1
06d8ec759353542e1a9ac33c0018a302  test1gb_2


xxd is from app-editors/vim-core
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2028

PostPosted: Fri Feb 23, 2024 10:28 am    Post subject: Re: read a file from the end to the beginning Reply with quote

SarahS93 wrote:
how to read a file from the end to the beginning?

"tac" can do this?
but if i read a file with tac and write it to a new one, and read the new one again with tac, the file is not the same as by the first reading.

Code:
dd if=/dev/urandom of=test1gb bs=1G count=1
1+0 Datensätze ein
1+0 Datensätze aus
1073741824 Bytes (1,1 GB, 1,0 GiB) kopiert, 2,26988 s, 473 MB/s

tac test1gb   > test1gb_1
tac test1gb_1 > test1gb_2

md5sum test1gb*
09118633eeb78190f81dac3d59ca9b5a  test1gb
0c27771a14ab8758d6b6faa378adbd5d  test1gb_1
c266a73814411dd824b3416595707896  test1gb_2


is there an other programm for read a file from the end to the beginning?

My guess is this would be down to handling of trailing linefeeds and/or embedded nulls. I'd suggest this is a meaningless test, since I assume your objective is not to read 1GB of random data, either forwards or backwards. With meaningful data you could run "diff" on the two files to see what the change was, but with 1 GB of random stuff (and therefore random length lines) it will be hard. Using test data with embedded nulls is good for the soul, and for discovering coding mistakes, but it's not necessarily useful for determining how a program should work with the actual application data.

If you are considering reading data in C, then it's possible to write a program to read line by line starting from the end (no, I don't have an example to hand, but I'm sure they're out there); if the objective is to read byte by byte backwards from the end, that's even easier, though unlikely to be much use! If you want to write shell script, or use *nix utilities to process lines in a proper text file backwards, cat will almost certainly be what you want.
_________________
Greybeard
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3206

PostPosted: Fri Feb 23, 2024 11:37 am    Post subject: Reply with quote

Tac reverses the order of lines in a file, is this what you want?
Works fine for text files, but the result won't be even close to what you'd expect from reading a binary file backwards.

Also, when reading a binary file backwards: what is the word size you want to use? Big or little endian?


Quote:
My guess is this would be down to handling of trailing linefeeds and/or embedded nulls
Maybe. Tac allows you to manually set a separator (check man tac), so there is a chance it might kinda work with arbitrary files, but it is still a sketchy scenario.
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
SarahS93
l33t
l33t


Joined: 21 Nov 2013
Posts: 712

PostPosted: Fri Feb 23, 2024 12:23 pm    Post subject: Reply with quote

the way with xxd tac and xxd works good for binary files, thanks.

is there a way that to do throug pv to see a status bar?
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9559
Location: beyond the rim

PostPosted: Fri Feb 23, 2024 12:44 pm    Post subject: Reply with quote

freke wrote:
I suspect this has to do with the type of file - works fine for ASCII-text files.

Can be done with binary files with
Code:
ns ~ # < test1gb xxd -p -c1 | tac | xxd -p -r > test1gb_1
ns ~ # < test1gb_1 xxd -p -c1 | tac | xxd -p -r > test1gb_2
ns ~ # md5sum test1gb*
06d8ec759353542e1a9ac33c0018a302  test1gb
3202e50b76d5d33506945670f12dd760  test1gb_1
06d8ec759353542e1a9ac33c0018a302  test1gb_2


xxd is from app-editors/vim-core


One thing to note here: When you're using tac in a pipe like this it will read the entire pipe content first into memory (there isn't really any option to avoid that) before writing it in reverse order to stdout. So if you want to use this on multi-gigabyte files you can easily run out of memory. If you use tac with a filename argument it can avoid that (as it can check how large the file is and seek to the end), but then you have to use temporary files to store the output of your initial filters which will cost storage space and be slower.


Last edited by Genone on Fri Feb 23, 2024 12:51 pm; edited 1 time in total
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9559
Location: beyond the rim

PostPosted: Fri Feb 23, 2024 12:50 pm    Post subject: Re: read a file from the end to the beginning Reply with quote

Goverp wrote:
My guess is this would be down to handling of trailing linefeeds and/or embedded nulls. I'd suggest this is a meaningless test, since I assume your objective is not to read 1GB of random data, either forwards or backwards. With meaningful data you could run "diff" on the two files to see what the change was, but with 1 GB of random stuff (and therefore random length lines) it will be hard. Using test data with embedded nulls is good for the soul, and for discovering coding mistakes, but it's not necessarily useful for determining how a program should work with the actual application data.

Given OPs history, the application data is likely random (encrypted) binary data, and reversing is just used for obfuscation.
Back to top
View user's profile Send private message
SarahS93
l33t
l33t


Joined: 21 Nov 2013
Posts: 712

PostPosted: Fri Feb 23, 2024 11:51 pm    Post subject: Reply with quote

Quote:
Given OPs history, the application data is likely random (encrypted) binary data, and reversing is just used for obfuscation.

you are right.

if i do the tac xxd thing, the read/write speed is about 100mb/s. is the used file system (ext4) the limitation?
Back to top
View user's profile Send private message
freke
Veteran
Veteran


Joined: 23 Jan 2003
Posts: 1006
Location: Somewhere in Denmark

PostPosted: Sat Feb 24, 2024 12:38 am    Post subject: Reply with quote

I don' think that's fs limited, but rather hw - I get similar timings on both tmpfs and ext4 (~37-40 secs for the 1gb file)
Back to top
View user's profile Send private message
SarahS93
l33t
l33t


Joined: 21 Nov 2013
Posts: 712

PostPosted: Wed Feb 28, 2024 7:19 am    Post subject: Reply with quote

are there other ways to read a file reverse?
with python or perl? are they faster?
or how can it speed up with xxd and tac?
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3206

PostPosted: Wed Feb 28, 2024 12:50 pm    Post subject: Reply with quote

Quote:
are there other ways to read a file reverse?
with python or perl? are they faster?
or how can it speed up with xxd and tac?
You _can_ write a byte reverse yourself in any language you want. You can even do it in C, file access and array of bytes are simple enough anyone can get there with a bit of Copy-Paste.
But, since you're asking about performance, let me ask you something before you start:
Do you know what DMA is?
Do you know how is sequential memory access different than random memory access?
Do you know how many bytes can a 64 bit CPU copy in a single instruction and how many bytes can it copy when reversing?

Going down the list makes your code slower and slower.
You _can't_ reverse at full speed. It's not going to happen. And since the data is already encrypted, there is no reason to mangle it even more, which means you could completely avoid running this code altogether for the best performance possible.
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9559
Location: beyond the rim

PostPosted: Mon Mar 04, 2024 8:54 am    Post subject: Reply with quote

SarahS93 wrote:
if i do the tac xxd thing, the read/write speed is about 100mb/s. is the used file system (ext4) the limitation?


Unlikely. If you're reading/writing from/to a mechanical HDD that performance is probably as good as you will get. Otherwise you'd have to double-check where the bottleneck is.

Quote:
are there other ways to read a file reverse?
with python or perl? are they faster?


If the bottleneck is hardware, then changing the implementation won't do much. And of course it can be implemented in basically any programming language, e.g. a basic python implementation (untested) could look like

Code:

ifile = open("path/to/input/file", "rb")
ofile = open("path/to/output/file", "wb")

isize = os.stat("path/to/input/file").st_size
bs = 1024 * 1024

pos = (isize / bs) * bs
while pos > 0:
    ifile.seek(pos)
    data = ifile.read(bs)
    data = bytes(reversed(data))
    ofile.write(data)
    pos -= bs

# process first block of data
ifile.seek(0)
data = ifile.read(bs)[:pos + bs]
data = bytes(reversed(data))
ofile.write(data)

ifile.close()
ofile.close()


Certainly not the most optimized version, but you get the idea.

EDIT: Removed syntax errors


Last edited by Genone on Wed Jun 12, 2024 8:09 am; edited 2 times in total
Back to top
View user's profile Send private message
SarahS93
l33t
l33t


Joined: 21 Nov 2013
Posts: 712

PostPosted: Sun Jun 09, 2024 1:55 am    Post subject: Reply with quote

genome
i write your code into test.py
change the input and output file and try to run it
Code:
if = open("test1gb", "rb")
of = open("test1gb_8", "wb")

isize = os.stat("test1gb").st_size
bs = 1024 * 1024

pos = (isize / bs) * bs
while pos > 0:
    if.seek(pos)
    data = if.read(bs)
    data = bytes(reversed(data))
    of.write(data)
    pos -= bs

# process first block of data
if.seek(0)
data = if.read(bs)[:pos + bs]
data = bytes(reversed(data))
of.write(data)

if.close()
of.close()


Code:
./test.py: Zeile 1: Syntaxfehler beim unerwarteten Symbol »(«
./test.py: Zeile 1: `if = open("path/to/input/file", "rb")'


i found this too
perl -0777pe '$_=reverse $_' test1gb > test1gb_1
it is very fast because perl read the file into the memory and rewrite it than

how is it possible to add a status bar like pv to this command?!
Back to top
View user's profile Send private message
Banana
Moderator
Moderator


Joined: 21 May 2004
Posts: 1484
Location: Germany

PostPosted: Sun Jun 09, 2024 7:58 am    Post subject: Reply with quote

I've also not tested the python script from Genone, but please make sure you have not copied some invisible garbage into your script. Make sure those are real ( and not some look alike.

Quote:
how is it possible to add a status bar like pv to this command?!

As far as I know the pv command, it will not work because there is not pipe involved. It is only one command involved and not more then one. https://linux.die.net/man/1/pv
_________________
My personal space
My delta-labs.org snippets do expire

PFL - Portage file list - find which package a file or command belongs to.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54456
Location: 56N 3W

PostPosted: Sun Jun 09, 2024 8:56 am    Post subject: Reply with quote

SarahS93,

Reading a file in reverse will always be slow by design.
The HDD tries to keep a cache but that by design reads subsequent blocks, not previous ones.

The kernel filesystem cache does something similar.
You need to defeat the cache systems that hove been developed since before the IBM PC was a thing.

To deal with files on a block device, work with block size data elements, not individual bytes.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9559
Location: beyond the rim

PostPosted: Tue Jun 11, 2024 1:52 pm    Post subject: Reply with quote

SarahS93 wrote:

Code:
./test.py: Zeile 1: Syntaxfehler beim unerwarteten Symbol »(«
./test.py: Zeile 1: `if = open("path/to/input/file", "rb")'


*sigh* That error looks like you attempted to execute the code using bash instead of python which obviously won't work.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 21924

PostPosted: Tue Jun 11, 2024 1:58 pm    Post subject: Reply with quote

Also, it looks like the environment again does not have LC_MESSAGES=C, so the error messages are localized, which makes it harder for us to understand the problem.

However, the shown statement is not valid Python either, because if is a reserved word in python.
Code:
$ bash -c 'if = open("a", "r")'
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: `if = open("a", "r")'
$ python -c 'if = open("a", "r")'
  File "<string>", line 1
    if = open("a", "r")
       ^
SyntaxError: invalid syntax
I agree that the shown error looks much more like the bash output than the Python output.

Changing if to a non-keyword, such as input_file would avoid the SyntaxError.
Back to top
View user's profile Send private message
logrusx
Veteran
Veteran


Joined: 22 Feb 2018
Posts: 1757

PostPosted: Tue Jun 11, 2024 4:01 pm    Post subject: Reply with quote

Just posting that here:

https://unix.stackexchange.com/questions/416401/how-to-reverse-the-content-of-binary-file

Best Regards,
Georgi
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum