Need Free Software Project Ideas

Santhosh Thottingal santhosh.thottingal at gmail.com
Thu May 20 08:22:36 PDT 2010


Any updates on this? Did you guys get anybody?
-santhosh

On Wed, Apr 28, 2010 at 10:00 AM,  <santhosh00 at gmail.com> wrote:
>> @Santhosh - Can we convert wiki2cd requirements into something that
>> requires 4 students efforts for a year? We will have to build it out
>> and present it as something a project guide would agree to. Can you
>> please send a couple of lines abstract on what we propose as the
>> project?
>>
>
>
> wiki2cd have the following wishlist items, there are many others, but the
> following require some amount of engineering.
>
> Full text index and search
> http://github.com/santhoshtr/wiki2cd/issues/unreads#issue/2
> This wishlist is for providing a fulltext indexing and search over the
> articles we provide on static repository(CD/DVD). But the requirement
> itself has potential for a separate standalone project. A library that can
> be used many webprojects without limiting to any particular language. The
> concept is this:
> 0) A file parser which can parse various types of files containing
> non-latin text eg Malayalam. File types can be plain text, html, pdf, xml,
> odt.To be written in python.
> 1) A tokenizer which can tokenize a given set of words. Tokenizer should
> also manage the frequency of tokens which is used for ranking later.To be
> written in python.
> 2) An indexer which can index the tokens by keeping the ranking, the
> source file. Indexer should also take care of the scalability issues. The
> size of index, using buckets for indexes, designing and devloping a design
> for keeping large amount of index data in space efficient way. The format
> of the index file is JSON. Should be capable of saving index to database
> too. To be written in python.
> 3) A fast search algorithm to search the searchkeys in the index and
> coming up with the results. This search should be done using
> jquery/javascript.
> The source code of wiki articles on CD should be free from unwanted wiki
> codes.
> 4) A web interface for search and displaying the results based on ranking.
> 5) Integrating the approximate search algorithms developed by myself to
> the search to improve the search efficiency. Cross language, sounds like,
> approximate search is desired.
>
> status: Proof of concept is available here
> http://thottingal.in/projects/jsonindexer/search.html
> value add to language computing: A feature rich language aware web search
> system.
> Issue: This project is timebounded and one of my top priority todo item.
> Incremental workable solutions should be available from early stages of
> development itself. Malayalam wiki version 2.0 depends on the completion
> of this project. Tamil wiki 1.0 is supposed to be released by October
> 2010.
> ie, Can't go in slow mode even if the students has 1 year. Very fast
> implementation expected.
>
> http://github.com/santhoshtr/wiki2cd/issues#issue/3
> The source code of wiki articles on CD should be free from unwanted wiki
> codes.
> This wishlist is a webscraping type requirement. And specific to wiki2cd.
> wiki2cd does not provide all the contents of a wikipage to user. Only the
> article need to be presented. Currently, css is used to hide the unwanted
> areas of screen. But the content remains in the html. Requiement is to
> remove the unwanted content(divs, spans) without the css hide technique.
> Proposed technology is pyquery.
> -Can't wait for 1 year for implementing this. Workable solution should be
> ready by atmost 2 months.
>
> Another project ideas:
> PHP client libraries for silpa services.
> (eg:http://thottingal.in/projects/spellchecker/)
> Porting SILPA algorithms to PHP(Anoop had done it for transliterator)
> A Desktop GUI for silpa.
>
>
>
>


More information about the discuss mailing list