Settings

Theme

GITenberg is an open source community for publishing ebooks in the public domain

gitenberg.org

214 points by bilinualcom 5 years ago · 42 comments

Reader

gravitas 5 years ago

The https://standardebooks.org/ project has been rebuilding ebooks (open ePUB format) with an eye on quality and readability on mobiles/tablets (fonts, copyedit, etc.) which might be of interest, all books and the website revision are tracked in git.

  • sethish 5 years ago

    I'm a big fan of standardebooks! They're doing fantastic work producing better ebooks from Project Gutenberg sources.

    • bilinualcomOP 5 years ago

      OP here, I found this website when I was looking for a way to get the updated version (with correction) of a PG (Project Gutenberg) book and all changes/diffs from the point that I scrape the book from PG website for my language learning side project: https://www.bilinual.com

      The bilinual Project also rebuild ebooks (modern HTML, PDF and open ePUB format) with better quality and readability( while it is not its prime goal). Take a look at one example here:

      https://www.bilinual.com/book/18043sven/sv/en#line=68&lpp=23 https://www.bilinual.com/download/18043sven-sv-en.pdf https://www.bilinual.com/download/18043sven-sv-en.epub

      • bscphil 5 years ago

        I tried to pull up a couple of books on the front page to check out what you had, and

        * One of them 404ed: https://www.bilinual.com/download/30117fren-fr-en.pdf

        * The other was full of problems: https://www.bilinual.com/download/16210fren-fr-en.pdf

        For example, many words don't have translations at all, and those that do are often incorrect. This feels like a very rough machine translation? For example:

        > et c'est surtout dans les paroisses riveraines du Saint-Laurent

        You translate this

        > and Ce east primarily in the · · some saint Laurence

        While Google Translate gives

        > and it is especially in the parishes bordering the St.Lawrence

        If you're using machine translation, why not use a Google API that might give usable results at least? If that's not plausible, maybe you should try to get together a team of volunteers to manually translate these ebooks for language learners?

        (I hope these suggestions are helpful, I'm not trying to be dismissive of your project.)

        • bilinualcomOP 5 years ago

          Hi, Thanks for checking the website.

          1- 404 issue: I implemented the PDF generation recently and I noticed that WeasyPrint has issue with html files that have too many tags (our books have around 2*number_of_words tags in them). This is not a big issue and it will be fixed soon in the next iteration.

          2- Using Google API: Google APIs and other translation tools are great for translating sentences. However, the problem with use of parallel texts for language learning is our brain laziness. After few pages, our brain looses its patient to solve the translation problems (critical thinking!?) and actually learn words and structure of sentences. The focus immediately goes toward translated sentences in your native language rather than the original text.

          Personally, I learn a word for a life when I slow down and think about similar words, its root, and at the end looking it up in a dictionary. The process is valuable.

          3- Team of volunteers: It is easier said than done. The functionality is present but I prefer to improve the suggestion engine as much as possible before I involve volunteers. Are you interested to join?

        • nix23 5 years ago

          >If you're using machine translation, why not use a Google API that might give usable results at least?

          I prefer https://www.deepl.com/translator to Google.

      • yesenadam 5 years ago

        Wow, they look awesome, thank you! (Learning spanish here)

        • yesenadam 5 years ago

          Looking at Unamuno's Abel Sanchez..

          'hermanos' is translated 'brethren' - super-archaic.

          p8, 'dedicado' in 'te has dedicado a pintar?' is translated 'hardcore'.

          'sí' in 'Que sí, hombre' is translated 'do', as in do-re-mi, I guess.

          p5,11-12 has 'quieres/quiero repeatedly translated as "with friends like those who needs enemies". Which is just inexplicable. I can't imagine how that would happen.

          Corrupted dictionary?

          ..and most of the trickiest words on a page aren't translated, maybe because not in your dictionary or they have 'lo' or 'se' appended.

          • bilinualcomOP 5 years ago

            Thanks, we are working to improve the quality of both our dictionaries and ML engine. Very hard to answer they the translation picked these and I have to look into each of these individually to answer your questions. The translations are not perfect but it is alive project and I am trying to improve it every hour that I find.

            "I can't imagine how that would happen." : Just as a hint, click on "translations" here:

            https://en.wiktionary.org/wiki/with_friends_like_these_who_n...

          • hombre_fatal 5 years ago

            Yeah, something is going very wrong. As if they were trying to tokenize html/pdf, pulling in a lot of extraneous characters/bytes, and using some sort of homebrew ML project to translate it. I don't know how else you'd get such bizarre results.

            • yesenadam 5 years ago

              Yes, not to mention that quieres and quiero should be near the top of the Words So Common That No Translation Is Needed list.

    • gluejar 5 years ago

      Me too! GITenberg started with a variety of goals, some of which were achieved, and others which were less so. Outstanding, considered as a prototype. Standard Ebooks has been much more successful at creating a community working to improve the quality of PG-derived ebooks. We're on the same team.

thangalin 5 years ago

I wrote a technical comparison between various public domain eBook projects:

https://dave.autonoma.ca/blog/2020/04/11/project-gutenberg-p...

koolba 5 years ago

They’re probably going to have to change their name to something that does not have “git” in it: https://public-inbox.org/git/20170202022655.2jwvudhvo4hmueaw...

  • sethish 5 years ago

    Gitenberg was one of the alternate spellings of Gutenberg in the 1400s.

    • agumonkey 5 years ago

      I propose Gytenberg

      • artiszt 5 years ago

        or, widely accepted standard [in german and slavic-languages] way back then, 'v' instead of 'u', and vice versa

        • agumonkey 5 years ago

          ah well for some reason the visual asthetics of gytenberg vs gvtenberg made me chose the former (even though at first I typed gvtenberg)

  • bhickey 5 years ago

    > It's hard to hold them responsible for picking a name that violated a policy that didn't yet exist.

    fwiw, Gitenberg has been around at least since 2012.

  • metiscus 5 years ago

    They've been named like that for six years apparently so I'm not sure how that works for what you linked above. It is a valid point that hopefully doesn't cause problems.

maire 5 years ago

This looks like a subset of Project Gutenberg?

The Girl from Alsace in Gitenberg: https://www.gitenberg.org/book/35926

The Girl from Alsace in Gutenberg: http://www.gutenberg.org/ebooks/35926

The numbers are even the same which seems suspicious. Hmmm.

  • sethish 5 years ago

    Yes, absolutely. Apologies if that's not clear from the website. GITenberg started as an experimental fork of PG, but due to the work of Eric Hellman at the Free Ebook Foundation, much of the infrastructure such as metadata formats, CI/CD for building books, and DVCS backend are being ported upstream to the Project Gutenberg infrastructure.

    • gluejar 5 years ago

      Most visibly, the generated covers, which have roots at NYPL and Bookalope, went into PG via GITenberg.

    • maire 5 years ago

      Thanks for the clarification!

      • bilinualcomOP 5 years ago

        OP here, I doubt that the website is "a SUBSET of Project Gutenberg". This is not mentioned anywhere in the website. As mentioned in PG website, while you can use the book freely, "The name 'Project Gutenberg' is a registered trademark.". I guess this is the reason they didn't mention PG.

        • sethish 5 years ago

          Now I remember, that was the reason I wasn't more clear about the connection to PG when I wrote the website https://github.com/gitenberg-dev/giten_site/. Since then my co-founder Eric Hellman has been doing engineering work for Project Gutenberg, as well as running the rest of the Free Ebook Foundation, which is the parent org of GITenberg, free-programing-ebooks, and Unglue.it.

          I think that the GITenberg collection contains all of the books in PG. At this point, the creation of new repos is automatically done when Distributed Proofreaders creates a new book in PG. Originally, I didn't include around 400 PG books due to their creators claiming copyright, and didn't include Bruce Sterling's book because he wouldn't let me re-license it creative commons rather than his pseudo-public-domain license.

          Not much has been happening with GITenberg itself in the past few years. But luckily, a lot of the concepts and code are getting upstreamed into PG. Which in my opinion, is way way better.

  • chmod775 5 years ago

    "GITenberg is an exploration of how Project Gutenberg might work if all the Gutenberg texts were on Github, so that tools like version control, continuous integration, and pull-request workflow could be employed."

    https://ebookfoundation.org/

  • bhickey 5 years ago

    It's a version controlled dump of Gutenberg.

  • gluejar 5 years ago

    More accurate to call it a fork.

hnarayanan 5 years ago

I don't know why, but I expected some amazing typography in their PDFs. :(

  • sethish 5 years ago

    [Standard Ebooks](https://standardebooks.org/) has accomplished a lot in producing better ebooks of public domain texts.

    • bilinualcomOP 5 years ago

      OP here, it seems standardebooks doesn't provide the books in PDF format. My side project, https://www.bilinual.com rebuild the ebooks in PDF format with translation hints, if you don't mind about learning a new language while reading your favourite books ;)

      • sethish 5 years ago

        There are about 400 GITenberg books that have CC-by licensed covers provided by Recovering the Classics. If you're interested in using that art for your PDFs I can find you the index!

        • bilinualcomOP 5 years ago

          Thanks, it would be great. I am curious to know why do they have CC license and not public domain?

          • robin_reala 5 years ago

            We have this problem with Standard Ebooks. The number of people that say things are public domain without actually checking it is very high. A CC0 licence is an explicit grant of public domain status by the licensor, and hence the legal issues rest with them in the event of any problem.

            Public domain obviously can be ascertained, but if CC0 hasn’t been granted we rely on dated reproductions: basically a photograph of the artwork in question in a book or journal with a copyright date of 1924 or earlier.

            (that’s obviously specifically a US legal reading, but SE from a legal point of view is a US project)

            • maxerickson 5 years ago

              CC0 isn't an indemnity.

              The difference between falsely claiming something is public domain and falsely claiming to grant a license to it under CC0 is going to be pretty minimal (and is likely result in little more than "please stop", blood and turnips and so on).

sethish 5 years ago

You might be more familiar with another project the Free Ebook Foundation maintains: https://github.com/EbookFoundation/free-programming-books/ Which is one of the top-10 repos on github by number of stars.

anaphor 5 years ago

I like how they're able to accurately tag translators vs original authors, e.g. https://github.com/GITenberg/The-History-of-the-Peloponnesia...

voldemort1968 5 years ago

What is improved by adding Git to PG?

  • bilinualcomOP 5 years ago

    Well, for my side project, https://www.bilinual.com I needed version controlling for changes that are made on a book (fixed typos, ...) and I found GITenberg project. There are several interesting tools developed during the project accessible here: https://github.com/gitenberg-dev

  • sethish 5 years ago

    It's not so much about just adding git to books, but git to the transcriptions of books from printed materials. There are errors in the transcriptions, many books are lacking formatting, and many books pre-date Unicode and are in ascii. Tracking the book's source files in git makes it easier to collaborate on making these changes, or seeing what changes have been made.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection