Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
how to pdf to text?[solved]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
CaptainBlood
Advocate
Advocate


Joined: 24 Jan 2010
Posts: 3864

PostPosted: Mon Sep 06, 2021 6:49 pm    Post subject: how to pdf to text?[solved] Reply with quote

Any idea which package to install for such purpose?
Thks 4 ur attention, interest & support.
_________________
USE="-* ..." in /etc/portage/make.conf here, i.e. a countermeasure to portage implicit braces, belt & diaper paradigm
LT: "I've been doing a passable imitation of the Fontana di Trevi, except my medium is mucus. Sooo much mucus. "


Last edited by CaptainBlood on Mon Sep 06, 2021 10:48 pm; edited 1 time in total
Back to top
View user's profile Send private message
fedeliallalinea
Administrator
Administrator


Joined: 08 Mar 2003
Posts: 31269
Location: here

PostPosted: Mon Sep 06, 2021 6:59 pm    Post subject: Reply with quote

I use OCRmyPDF to trasofrm a pdf image text.
_________________
Questions are guaranteed in life; Answers aren't.
Back to top
View user's profile Send private message
carcajou
Apprentice
Apprentice


Joined: 10 Jun 2008
Posts: 248

PostPosted: Mon Sep 06, 2021 7:42 pm    Post subject: Reply with quote

Maybe app-text/tesseract?

Also, lately I am using LibreOffice. It worked quite well for me when I need to perform quick PDF edits. The downside is that usually recognizes text as bunch of text boxes.
Back to top
View user's profile Send private message
CaptainBlood
Advocate
Advocate


Joined: 24 Jan 2010
Posts: 3864

PostPosted: Mon Sep 06, 2021 8:23 pm    Post subject: Reply with quote

fedeliallalinea, Thks.
Should I need any of those:
Code:
eix media-libs/jbig2enc
* media-libs/jbig2enc
     Available versions:  0.28-r1 0.29 {gif jpeg png tiff webp}
Thks 4 ur attention, interest & support.
_________________
USE="-* ..." in /etc/portage/make.conf here, i.e. a countermeasure to portage implicit braces, belt & diaper paradigm
LT: "I've been doing a passable imitation of the Fontana di Trevi, except my medium is mucus. Sooo much mucus. "
Back to top
View user's profile Send private message
CaptainBlood
Advocate
Advocate


Joined: 24 Jan 2010
Posts: 3864

PostPosted: Mon Sep 06, 2021 9:19 pm    Post subject: Reply with quote

kukibl wrote:
Maybe app-text/tesseract?
Yes maybe.

OCRmyPDF or tesseract docs seem no handy for my use case:

The pdf files I wish to listen to seem to have a text layer already, as I can select text in pdf viewer such as evince.

Currently building firefox hold me back from trying anything any futher.
Shouldn't last long, though.

Thks
_________________
USE="-* ..." in /etc/portage/make.conf here, i.e. a countermeasure to portage implicit braces, belt & diaper paradigm
LT: "I've been doing a passable imitation of the Fontana di Trevi, except my medium is mucus. Sooo much mucus. "
Back to top
View user's profile Send private message
AJM
Apprentice
Apprentice


Joined: 25 Sep 2002
Posts: 195
Location: Aberdeen, Scotland

PostPosted: Mon Sep 06, 2021 9:35 pm    Post subject: Reply with quote

CaptainBlood wrote:

OCRmyPDF or tesseract docs seem no handy for my use case:
The pdf files I wish to listen to seem to have a text layer already, as I can select text in pdf viewer such as evince.


How about pdftotext from app-text/poppler (might need utils keyword?)
Back to top
View user's profile Send private message
CaptainBlood
Advocate
Advocate


Joined: 24 Jan 2010
Posts: 3864

PostPosted: Mon Sep 06, 2021 10:47 pm    Post subject: Reply with quote

AJM wrote:
How about pdftotext from app-text/poppler (might need utils keyword?)
That might be the minimalistic thingie I first couldn't find a path to... :wink:
poppler[utils] already there, as I was looking for a standalone package. :oops:
Thks 4 ur attention, interest & support.
_________________
USE="-* ..." in /etc/portage/make.conf here, i.e. a countermeasure to portage implicit braces, belt & diaper paradigm
LT: "I've been doing a passable imitation of the Fontana di Trevi, except my medium is mucus. Sooo much mucus. "


Last edited by CaptainBlood on Tue Sep 07, 2021 6:24 am; edited 1 time in total
Back to top
View user's profile Send private message
figueroa
Advocate
Advocate


Joined: 14 Aug 2005
Posts: 3005
Location: Edge of marsh USA

PostPosted: Tue Sep 07, 2021 4:21 am    Post subject: Reply with quote

I know you got this already, but:
Code:
$ ls /usr/bin | grep pdf

and then:
Code:
$ equery b /usr/bin/pdftotext
 * Searching for /usr/bin/pdftotext ...
app-text/poppler-21.07.0 (/usr/bin/pdftotext)

Code:
pdftotext --help

Code:
man pdftotext

_________________
Andy Figueroa
hp pavilion hpe h8-1260t/2AB5; spinning rust x3
i7-2600 @ 3.40GHz; 16 gb; Radeon HD 7570
amd64/23.0/split-usr/desktop (stable), OpenRC, -systemd -pulseaudio -uefi
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum