Extract text out of a image/PDF


For this purpose, you may read my previous related post here.

I am going to introduce (again) to the tesseract OCR engine. But this time I am using 16.04 and the command to install it is:

sudo apt install tesseract-ocr

If you have some PDF and want it to convert to image to further process it. You may use various methods. One of them may be:

convert input.pdf output.png

But this will produce a relatively low-resolution image that may result in bad text out of OCR.

So, instead we use:

convert -density 300 -quality 100 input.pdf output.png

Changing the density and tell it to not to decrease the quality than 100%.

Note if the input.pdf is a multi-page PDF, it will create different output images named like: output-0.png, output-1.png and so on.

So finally, use tesseract as:

tesseract output.png text_file -l eng

It will create a text_file.txt in the same directory. You may play with various options of convert or tesseract based on your needs.

Advertisements

adb not detecting the phone!


What is it?

adb is abbreviated for “Android Debug Bridge”. It comes with the Android Development SDK but it can be installed separately too. It allows users to send commands to the Android phone via PC. It will be helpful in cases if you are tinkering with your phone and reached some nearly bricked condition.

In my case: I have a Zuk Z1 and I have installed TWRP recovery and Cyanogen 13 installed as of now. There might be some case or something went wrong that I can’t boot into my phone and even recovery isn’t working. Then booting the phone into bootloader might help. But I didn’t find any way to boot into bootloader without going into recovery. And suppose recovery isn’t working too. Then adb can help you here. You may use adb to boot into bootloader and then flash recovery and the OS later on.

Installation

On debian based systems e.g. Ubuntu, you may install it like:

sudo apt-get install android-tools-adb android-tools-fastboot

If it didn’t work, then see here. It will get you adb and fastboot.

My Problem:

The problem was that after installing adb, I connected the phone and fired the command:

adb devices

but it wasn’t listing my device.

So after searching a bit, I got a solution.

First you need to know the vendor id of your device. Search the Internet for that.

Other thing you can try is connect your phone to Linux PC and from terminal type:

lsusb -v

and look for your device and then look for “idVendor” field. For example mine is: 0x2b4c

Another thing you may try is to use following command from terminal:

fastboot devices

after booting the phone into the bootloader.

So coming back to my problem, my device wasn’t getting listed on running adb devices.

The solution is that you need to create/edit a file ~/.android/adb_usb.ini and write it as:

# ANDROID 3RD PARTY USB VENDOR ID LIST — DO NOT EDIT.
# USE ‘android update adb’ TO GENERATE.
# 1 USB VENDOR ID PER LINE.
# ie 0x2207
0x2b4c

And place the vendor id at the end, like I did above. Save file and run adb devices again. It will list the device but it will be unauthorized. Now your phone should pop up a permission box to allow the connection. Allow it (tick always).

Then do the following to restart the adb server (with sudo).

adb kill-server

and afterwards

adb start-server

Try reconnecting the phone after disconnecting if it didn’t work.

Import string as Dictionary in Python


I had a Python dictionary (that looks like json) stored in a file, generated by some code. Now I wanted it in Python to act as the dictionary. But when we read that file, it gets stored as string (in a variable). But I wanted to access its keys and values. But if it’s a string we couldn’t do this directly.

So I searched for a method to convert string to dict. Here is the go:

In [1]: import ast

In [2]: string = “{‘name’: ‘xyz’, ‘age’: 3}”

In [3]: string
Out[3]: “{‘name’: ‘xyz’, ‘age’: 3}”

In [4]: ast.literal_eval(string)
Out[4]: {‘age’: 3, ‘name’: ‘xyz’}

In [5]: the_dict = ast.literal_eval(string)

In [6]: the_dict.keys()
Out[6]: [‘age’, ‘name’]

 

Import FreeCAD in Python


If you’ve compiled FreeCAD, but don’t know how to import it as a package in Python then read on.

First of all, you need to locate where the FreeCAD.so file is.

locate FreeCAD.so

Mine is in /home/mandeep/Desktop/build/lib.

To be able to do:

import FreeCAD

You’ll have to go to that directory and run python and if you now try to import, it will work. But most probably, you won’t agree to use it as it might feel impractical.

Another method is to append this path to sys.path. For this, you need to edit a file site.py.

For example,

sys.path.append(“/home/mandeep/Desktop/build/lib”)

The site.py would probably be where your python lib directory is. It may probably look like: /usr/lib/python2.7. But I am using the Python within the virtualenv so the python lib is within the virtualenv e.g. venv/lib/python2.7. Here venv is my virtualenv.

Read more about this at https://docs.python.org/2/library/site.html

The idea to edit the site.py file is that it gets initialized with Python. So you can now import your packages from anywhere.

Sourcehttp://www.freecadweb.org/wiki/index.php?title=Embedding_FreeCAD

Referencehttp://stackoverflow.com/a/15109881/3784226

I said “sed”


Now it’s the turn for macros be fetched from the wiki page at http://www.freecadweb.org/wiki/index.php?title=Macros_recipes. We’ll need to parse the links (and macro names) from there. So we need some pattern to be able to parse similar data (i.e. links). The common thing was that the links had http://freecadweb.org ?title=Macro… So it was suggested by mentors that we can have a template with the links that can have a specific class.

So in the wiki source, the links are written as: [[Macro makeCube|Macro MakeCube]]. Here 1st argument (before the pipe) is the name of the page/URL. The second one is the link text that appears on the wiki. If we omit the second argument, then the first one is taken as the link text and URL.

So aim was to convert something like [[Macro makeCube|Macro MakeCube]] to {{MacroLink|Macro makeCube}}. Here MacroLink is the name of the template. Which is replaced by the tags we specify. Here it would be replaced by a span with class “MacroLink”.

Okay, so what’s the problem! There are more than 100 entries there and one would simply edit it one by one manually. So “sed” was there to rescue.

Now I had to look for some pattern to be able to replace text. So as we have to remove the content after the pipe (‘|’) and replace the [[]] with {{}} and add template tag and a pipe.

I am not good with regular expressions. On searching something similar, I found http://stackoverflow.com/a/10613688. So I got the idea and used it like:

sed ‘s/|Macro.*\]\]/\}\}/i’ input.txt > wikinew.txt

It selects the text starting from “|Macro” till “]]” and replace it by “}}” as we need it at the end. Hence, the content after pipe would be removed too. Now we’ll have entries like [[Macro makeCube}}.

sed -i.bak ‘s/\[\[Macro/\{\{MacroLink|Macro/’ wikinew.txt

Now it was the turn for the prefix. The above command will select text like [[Macro and will replace it by {{MacroLink|Macro. As we needed to add a template “MacroLink” to it. That fulfilled the need. Although, 2-3 entries needed to be modified manually. The final page is here (not sure if it would exist). Code

The code is https://github.com/mandeeps708/scripts/tree/master/FC-Wiki-template. I’m done!

Fetching GitHub submodules


Yesterday, I took the first step towards fetching information from GitHub. Today added code for checking if the repository contains submodules and will return that instance. Then we may use it to access submodule information like name, submodule_git_url etc.

See: https://developer.github.com/v3/repos/contents/#response-if-content-is-a-submodule

I used x.raw_data.get(typeto check if x a submodule or not. Actually raw_data contains many other attributes (like html_url, git_url, name, sha, type etc.) that provide additional info. about the instance itself. And for fetching the original repository link, I used: x.raw_data.get(submodule_git_url).

Here is the code for the particular file: https://github.com/mandeeps708/Github-API-Fun/blob/master/get-submodules.py

More about repository methods in PyGithub: http://pygithub.github.io/PyGithub/v1/github_objects/Repository.html

Rate Limit: https://developer.github.com/v3/#rate-limiting

For requests using Basic Authentication or OAuth, you can make up to 5,000 requests per hour. For unauthenticated requests, the rate limit allows you to make up to 60 requests per hour. Unauthenticated requests are associated with your IP address, and not the user making requests.

GitHub API Returns


Hi everyone,

On 11 June, I tried experimenting with parsing the submodules data from https://github.com/FreeCAD/FreeCAD-addons.

I started working on PyGithub today (12 June). Actually, it’s a third-party library wrapper written in Python for the GitHub API. The day went off struggling with PyGithub to work. I couldn’t understand much at the beginning. As the night elapsed, it started working for me. 😛

I started with some basic example like the fetching of the names of the files. You may see the repository here: https://github.com/mandeeps708/Github-API-Fun. Actually, this repository has also been created using the API itself (without opening github.com in the browser). Finally got slept at around 7 AM.

Now next is to do get if the instance fetched is a submodule or not. It seems to be much easier to be done now.