Feed aggregator
Season Posters Anyone?
I simply cannot stop playing around with TV Shows and Nepomuk. We already had posters for the series but not for the seasons. Well, we do now:
Fun, isnt’ it? But sadly it required an improved libtvdb and additions to SDO. The former is already in git master. But the SDO changes are a bit experimental. That is why I put them into branch nmm/banners. Also due to both these requirements not being that easy to install I put the improved season handling of nepomuktvnamer into branch seasonResources.
So in order to try this yourself you need to get libtvdb git master and the mentioned branches of SDO and nepomuktvnamer. But do not worry, I am pretty sure that I will get the SDO changes merged soon. Then this will become easier.
A Fun Release: Nepomuk TV Namer 0.2
As requested I prepared a release of the TV Show managing thingi I implemented. You can download it from download.kde.org mirrors at unstable/nepomuk/nepomuktvnamer-0.2.0.tar.bz2.
The nepomuktvnamer 0.2.0 is a little more polished than the original version and comes with a nice service menu extension allowing to manually start the fetching of TV Show information on folders or video files. This is important since the service does only react on new videos. So you need to start the initial information fetching manually on your TV Show folder.
The tvnamer has two requirements in addition to the typical KDE ones:
- LibTVDb – LibTvdb is a Qt-based library which provides asynchronous access to TV series information from thetvdb.com via a very simple interface. Its use in the Nepomuk TV namer should be obvious.
- Shared-Desktop-Ontologies 0.9.0 – The recently released new version of SDO provides the required nfo:depiction property used by the tvnamer to store banners.
I also recommend to apply the kdelibs patch I mentioned earlier to actually see the TV Show banners. Have fun with it – maybe someone will even package it.
Just For The Fun Of It: Browsing Music With Nepomuk
Since implementing the TV Show KIO slave was that easy I decided I could do the same for music – just to show how simple it can be. There a a few more lines but that is only because I added browsing by album, artist, and genre. So there are a lot of if/else constructs. Anyway, here goes:
Browsing music by artist is easy. As you can see I also implemented a preview generator plugin the same way I did for the TV Shows. The only problem is that there is no tool yet that automatically fetches those images. Thus, I had to do it manually for one example which looks somewhat like this:
qdbus org.kde.NepomukStorage /datamanagement org.kde.nepomuk.DataManagement.addProperty "nepomuk:/res/0152825f-5c49-4ca8-aa0a-23fc9a1305f1" "nfo:depiction" "/home/trueg/atb2.jpg" "shell"
This is part of the fancy Data management API which allows me to add the file atb2.jg as a nfo:depiction of the nco:Contact resource identifying the artist ATB.
Anyway, entering the artist themselves and what lies beyond:

(Again I had to fetch the cover art manually. I did not want to implement my own cover art retrieval tool and I found the Amarok code not to be very reusable. Again maybe someone wants to take up this task?)
Finally we end up in the album tracks. Sadly dragging an album to a media player playlist does not work yet. I am not quite sure how to fix that.
Last but not least a quick look at browsing by genre:
This was fun. But before I go to bed let me share with you the very simple code which is responsible for the nice previews (abbreviated of course):
bool MusicThumbCreator::create(const QString &path,
int w, int h,
QImage &img)
{
KUrl url(path);
QStringList pathTokens
= url.path().split('/', QString::SkipEmptyParts);
if(pathTokens.count() < 2) {
return false;
}
// there are only two cases for us: artists and albums
if(pathTokens[pathTokens.count()-2] == QLatin1String("artists") ||
pathTokens[pathTokens.count()-2] == QLatin1String("albums")) {
const QUrl uri = recoverUriFromUrlToken(pathTokens.last());
// we just query the first depiction there is
Soprano::QueryResultIterator it
= Nepomuk::ResourceManager::instance()->mainModel()
->executeQuery(
QString::fromLatin1("select ?u where { "
"%1 nfo:depiction [ nie:url ?u ] . "
"} LIMIT 1")
.arg(Soprano::Node::resourceToN3(uri)),
Soprano::Query::QueryLanguageSparql);
if(it.next()) {
img.load(it["u"].uri().toLocalFile());
return true;
}
}
return false;
}
The rest of the code can be found in the nepomuk-audio-kio-slave scratch repository. Maybe at some point I could just throw all of those things into some “Nepomuk KIO extensions” package… oh, well, off to bed now…
Notably v0.4
I meant to release a new version of Notably on Friday, but I got sidetracked with some stuff. Plus, I've been spending a lot of time on designing the UI for this release, which I think isn't a good idea. Notably is still not quite mature, and I think right now features are more important than polish.
Last week, I showcased some tagging UIs. They aren't yet ready to be deployed in KDE, as they need to be polished quite a bit. Plus, there is a lot scope for collaboration when designing UIs.
Changes Revamped UII've gotten rid of most of the custom KWin code. I'd initially wanted my application to look quite different, with a blurred background and fixed size. But that would be locking the user into a fixed interface.
Notably now looks and behaves more like a KDE application. (No more blurred background)
Better Sidebar
Most of the code improvements have been in the sidebar, which now acts as a proper menu and allows navigation.
Experimental Widgets
Some brand new widgets;
Tag WidgetI showcased the new Tag Widget I was working on a couple of days ago. Since then, I've improved the code to make it more maintainable, unfortunately it still needs a lot of work.
Tag Cloud
Creating a Tag Cloud turned out to be a greater challenge than I expected. Right now it's implement with some basic HTML in a QTextBrowser. I'm still experimenting with some custom layout code. Lets see how it goes.
Tag Browsing
You can browse your notes based on the tags they have been given. This will eventually have to be expanded to allow multiple facets - like tags, dates and so on. Implementing it on the Nepomuk side is fairly simple, but I'm not sure about the interface.
After a couple of more releases when I've gotten most of the main features down, I'll start on polishing it up and moving it to extragear :)
Source Code: kde:notably
Virtuoso going crazy?
There have been cases of virtuoso going a little crazy and consuming a lot of CPU cycles. It's extremely frustrating. However, it's ever more annoying when you have no idea what's wrong.
Most of bug reports we get just say that virtuoso is consuming too much CPU, and that isn't the least bit helpful. So, here is a short guide to figure out what query is causing virtuoso to go crazy.
Listing QueriesNepomuk contains a query service which is used to cache queries and to execute them asynchronously. We can use it at any point to figure out which all queries are being executed.
$ qdbus org.kde.nepomuk.services.nepomukqueryservice / /nepomukqueryservice /nepomukqueryservice/query1 /nepomukqueryservice/query4 /servicecontrol
Each of the /nepomukqueryservice/query[n] represents one query.
Getting the SPARQL Query$ qdbus org.kde.nepomuk.services.nepomukqueryservice /nepomukqueryservice/query4 queryString
And you'll get something like this -
select distinct ?r ?v2 where { { ?r a
<http://www.semanticdesktop.org/ontologies/2007/11/01/pimo#Note> . ?r
<http://www.semanticdesktop.org/ontologies/2007/08/15/nao#created> ?v2 . }
. ?r <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#userVisible>
?v1 . FILTER(?v1>0) . } ORDER BY DESC ( ?v2 )
This query is extrememly important cause without it finding the cause is nearly impossible.
Killing queries$ qdbus org.kde.nepomuk.services.nepomukqueryservice /nepomukqueryservice/query4 close
This will end the query
When/If you find virtuoso consuming too much cpu, list out all the queries and close each of them one by one. The moment virtuoso gets better, you'll have your culprit.
That's the query you should post in the bug report.
TV Series KIO Slave Preview Issue Fixed
I fixed the thumbnailing issue. It even selects an image which has the best aspect ratio to begin with – not more squeezed banners:
But as feared it requires a patch to kdelibs which I hope to get into KDE 4.8.1.
More Fun With TV Shows
After fetching all the details about TV Shows from thetvdb.com I went back to my favorite way of browsing things: KIO slaves. So without further ado let me introduce the tvshow:/ KIO slave:
So the root folder lists all TV Series. As you can see the previews are messed up aspect-ratio-wise. If anyone has an idea of how to improve that without patching KIcon or KIO or caching my own thumbnails in some tmp folder please tell me.
And finally the episodes. And just because it is fun here is one more:
Why do this? Well, nepomuksearch cannot create sub-folders (yet) and this only has about 120 relevant lines of code, most of which is used up by the three queries it creates.
To try it simply update your git clone of the nepomuktvnamer and have fun.
A Little Bit Of Query Optimization
Every once in a while I add another piece of query optimization code to the Nepomuk Query API. This time it was a direct result of my earlier TV Show handling. I simply thought that a query like “downton season=2 episode=2” takes too long to complete.
Now in order to understand this you need to know that there is a rather simple QueryParser class which converts a query string like the above into a Nepomuk::Query::Query which is simply a collection of Nepomuk::Query::Term instances. A Query instance is then converted into a SPARQL query which can be handled by Virtuoso. This SPARQL query already contains a set of optimizations, some specific to Virtuoso, some specific to Nepomuk. Of course there is always room for improvement.
So let us get back to our query “downton season=2 episode=2” and look at the resulting SPARQL query string (I simplified the query a bit for readability. The important parts are still there):
select distinct ?r where {
?r nmm:season "2"^^xsd:int .
{
?r nmm:hasEpisode ?v2 .
?v2 ?v3 "2"^^xsd:int .
?v3 rdfs:subPropertyOf rdfs:label .
} UNION {
?r nmm:episodeNumber "2"^^xsd:int .
} .
{
?r ?v4 ?v6 .
FILTER(bif:contains(?v6, "'downton'")) .
} UNION {
?r ?v4 ?v7 .
?v7 ?v5 ?v6 .
?v5 rdfs:subPropertyOf rdfs:label .
FILTER(bif:contains(?v6, "'downton'")) .
} .
}
Like the user query the SPARQL query has three main parts: the graph pattern checking the nmm:season, the graph pattern checking the episode and the graph pattern checking the full text search term “downton“. The latter we can safely ignore in this case. It is always a UNION so full text searches also include relations to tags and the like.
The interesting bit is the second UNION. The query parser matched the term “episode” to properties nmm:hasEpisode and nmm:episodeNumber. On first glance this is fine since both contain the term “episode“. However, the property nmm:season which is used in the first non-optional graph-pattern has a domain of nmm:TVShow. nmm:hasEpisode on the other hand has a domain of nmm:TVSeries. That means that the first pattern in the UNION can never match in combination with the first graph pattern since the domains are different.
The obvious optimization is to remove the first part of the UNION which yields a much simpler and way faster query:
select distinct ?r where {
?r nmm:season "2"^^xsd:int .
?r nmm:episodeNumber "2"^^xsd:int .
{
?r ?v4 ?v6 .
FILTER(bif:contains(?v6, "'downton'")) .
} UNION {
?r ?v4 ?v7 .
?v7 ?v5 ?v6 .
?v5 rdfs:subPropertyOf rdfs:label .
FILTER(bif:contains(?v6, "'downton'")) .
} .
}
Well, sadly this is not generically true since resources can be double/triple/whateveriple-typed, meaning that in theory an nmm:TVShow could also have type nmm:TVSeries. In this case it is obviously not likely but there are many cases in which it in fact does apply. Thus, this optimization cannot be applied to all queries. I will, however, include it in the parser where it is very likely that the user does not take double-typing into account.
If you have good examples that show why this optimization should not be included in the query parser by default please tell me so I can re-consider.
Now after having written this and proof-reading the SPARQL query I realize that this particular query could have been optimized in a much simpler way: the value “2″ is obviously an integer value, thus it can never match to a non-literal like required for the nmm:hasEpisode property…
A Little Drier But Not That Dry: Extracting Websites From Nepomuk Resources
After writing about my TV Show Namer I want to get out some more ideas and examples before I will be retiring as a full-time KDE developer in a few weeks.
The original idea for what I am about to present came a long while ago when I remembered that Vishesh gave me a link on IRC but I could not remember when exactly. So I figured that it would be nice to extract web links from Nepomuk resources to be able to query and browse them.
As always what I figured would be a quick thing lead me to a few bugs which I needed to fix before moving on. So all in all it took much longer than I had hoped. Anyway, the result is another small application called nepomukwebsiteextractor. It is a small tool without a UI which will extract websites from the given resource or file. If called without arguments it will query for resources which do not have any related websites and and extract websites from them. Since it tries to fetch a title for each website this is a very slow procedure.
As before the storing to Nepomuk is the easy part. Getting the information is way harder:
using namespace Nepomuk;
using namespace Nepomuk::Vocabulary;
// create the main Website resource
NFO::Website website(url);
website.addType(NFO::WebDataObject());
QString title = fetchHtmlPageTitle(url);
if(!title.isEmpty()) {
website.setTitle(title);
}
// create the domain website resource
KUrl domainUrl = extractDomain(url);
NFO::Website domainWebPage(domainUrl);
domainWebPage.addType(NFO::WebDataObject());
domainWebPage.addPart(website.uri());
title = fetchHtmlPageTitle(domainUrl);
if(!title.isEmpty()) {
domainWebPage.setTitle(title);
}
// relate the two via the nie:isPartOf relation
website.addProperty(NIE::isPartOf(), domainUrl);
domainWebPage.addProperty(NIE::hasPart(), website.uri());
// funnily enough the domain is a sub-resource of the website
// this is so removing the website will also remove the domain
// as it is the one which triggered the domain resource's creation
website.addSubResource(domainUrl);
// save it all to Nepomuk
Nepomuk::storeResources(SimpleResourceGraph() << website << domainWebPage);
Once done you will have thousands of nfo:Website resources in your Nepomuk database, each of which are related to their respective domain via nie:isPartOf (I am not entirely sure if this is perfectly sound but it is convenient as far as graph traversal goes). We can of course query those resources with nepomukshell (this is trivial but allows me to pimp up this blog post with a screenshot):
And of course Dolphin shows the extracted links in its meta-data panel:
I am not entirely sure how to usefully show this information to the user yet but it is already quite nice to navigate the sub-graph which has been created here.
Of course we could query all the resources which mention a link with domain www.kde.org:
select ?r where {
?r nie:links ?w .
?w a nfo:Website .
?w nie:isPartOf ?p .
?p nie:url <http://www.kde.org> .
}
Or the Nepomuk API version of the same:
using namespace Nepomuk::Query;
using namespace Nepomuk::Vocabulary;
Query query =
ComparisonTerm(NIE::links(),
ResourceTypeTerm(NFO::Website()) &&
ComparisonTerm(NIE::isPartOf(),
ResourceTerm(QUrl("http://www.kde.org"))
)
);
It gets even more interesting when combined with the nfo:Websites created by KParts when downloading files.
Well, now I provided screenshots, code examples, and a link to a repository – I think it is all there – have fun.
Update: In the spirit of promoting the previously mentioned ResourceWatcher here is how the website extractor would monitor for new stuff to be extracted:
Nepomuk::ResourceWatcher* watcher = new Nepomuk::ResourceWatcher(this);
watcher->addProperty(NIE::plainTextContent());
connect(watcher,
SIGNAL(propertyAdded(Nepomuk::Resource,
Nepomuk::Types::Property,
QVariant)),
this,
SLOT(slotPropertyAdded(Nepomuk::Resource,
Nepomuk::Types::Property,
QVariant)));
watcher->start();
[...]
void slotPropertyAdded(const Nepomuk::Resource& res,
const Nepomuk::Types::Property&,
const QVariant& value) {
if(!hasOneOfThoseXmlOrRdfMimeTypes(res)) {
const QString text = value.toString();
extractWebsites(res, text);
}
}
A better Tagging Widget
A long long time ago, a very simple tagging widget was implemented. We always though - "Eh! This is temporary. We'll come up with a better one later." But that never happened.
There is a lot of code in Nepomuk. However most of it is backend stuff which does absolutely marvelous things behind the scenes - Auto duplicate merging, type checking with respect to the ontologies, caching and lots more. We, however, lack good UIs.
So, if you're a UI designer looking for a challenge, look at Nepomuk. We have a lot of data.
Anyway, enough promotion! Unlike yesterday, I won't be pointing you towards the source (though it isn't that hard to find). I'll just be showcasing some screenshots. You'll get to try out the tagging widget and whatever-is-in-store-for-tomorrow on Friday.
This was originally implemented with a QListView in flow mode with a custom delegate for tags. Getting it to automatically resize was a pain, and I was missing out on a lot of effects. Eventually, a couple of hours back, someone at #qt pointed me towards Flow Layouts.
I'm in the process of rewriting the old item delegate code, to a widget based one. Minus minor variations it should look the same.
As last time, if someone can make a nice mockup, I'll be more than happy to implement it :)
Nepomuk Tag Manager
Welcome to Nepomuk Tag Week! Well, not really, since it's not an official thing. I've just been working a lot with tags lately, and this week I'm going to be spamming you with some tag related updates (One for every day of the week, minus Monday)
I thought I'll start with something small - Tag Management.
We've been badly needing a UI to allow the users to modify, merge and delete their tags. You could always delete this using the conventional "Add Tag" dialog, but this way you can do batch deletes.
I'm not much of a UI designer so the interface is quite bare. I'm hoping that someone can come up with a beautiful mockup, which I can then implement.
And with this I can close BUG 258323.
Source Code: kde:scratch/vhanda/nepomuktagmanager
Update -
I've added a Filter bar, merged the "Rename Tag" and "Merge Tags" button, and double clicking on a tag now opens it in the file browser.
Something Way Less Dry: TV Shows
After my rather boring blog about change notifications I will now to write about something that I wanted every since I started developing Nepomuk. But only now has Nepomuk reached a point where it provides all the necessary pieces. I am talking about TV Show management – obviously I mean the rips from the DVD boxes I own.
So what about it? Well, I wrote a little tool called nepomuktvnamer (inspired by the great python tool tvnamer) which works a bit like our nepomukindexer except that it does not extract meta-data from the file but tries to fetch information about TV Shows from thetvdb.com. You can run the tool on a single file or recursively on a whole directory. It will then use a set of regular expressions (based on the ones from tvnamer) to analyze the file names and extract the show title, season and episode numbers.

The nepomuktvnamer will ask the user in case multiple matches have been found and cannot be filtered according to season and episode numbers
It will then save that information into Nepomuk through our powerful Data Management API. The code looks a bit as follows ignoring code to store actors, banners and the like.
const Tvdb::Series series = getSeriesForName(name); Nepomuk::NMM::TVSeries seriesRes; seriesRes.setTitle(series.name()); seriesRes.addDescription(series.overview()); Nepomuk::NMM::TVShow episodeRes(url); episodeRes.setEpisodeNumber(episode); episodeRes.setSeason(season); episodeRes.setTitle(series[season][episode].name()); episodeRes.setSynopsis(series[season][episode].overview()); episodeRes.setReleaseDate(QDateTime(series[season][episode].firstAired(), QTime(), Qt::UTC)); episodeRes.setGenres(series.genres()); seriesRes.addEpisode(episodeRes.uri()); episodeRes.setSeries(seriesRes.uri()); Nepomuk::SimpleResourceGraph graph; graph << episodeRes << seriesRes; Nepomuk::storeResources(graph, Nepomuk::IdentifyNew, Nepomuk::OverwriteProperties)
(This code uses my very own LibTvdb which is essentially a Qt’ish wrapper around the thetvdb.org API.)
The result of this can be seen in Dolphin:
Here we see the actors, the series, the synopsis and so on. Clicking on an actor will bring up all they played in, clicking on the series will bring up all the episodes from that series, and so on.
Now let us have a look at the series itself using my beefed up version of the Nepomuk KIO slave:
As we can see the nepomuktvnamer also fetched a banner which is stored as nie:depiction. (A reason why to compile nepomuktvnamer you need the git master version of shared-desktop-ontologies. Oh, and also nepomuktvnamer is linked against libnepomukcore from nepomuk-core instead of libnepomuk. So you either have to install nepomuk-core which cab be a bit tricky or quickly change the CMakeLists.txt to link to libnepomuk instead.)
We can of course also query the newly created information. Simple queries in Dolphin could be “series:Sherlock” or “sherlock season=1″. Well, things to play with.
I also created the smallest Nepomuk service to date: the nepomuktvnamerservice uses the ResourceWatcher to listen for newly created nfo:Video resources and simply calls the nepomuktvnamer on the related file.
Last but not least the git repository contains a python script which checks for each existing series if a new episode has been aired. The output looks a bit like this:
White Collar - New episode "Withdrawal" (02x01) first aired 13 July 2010. Freaks and Geeks - No new episode found. The Mentalist - Upcoming episode "Red is the New Black" (04x13) will air 02 February 2012.
Now obviously this is more a task for a Plasma applet. So if anyone out there is interested in doing that – please go ahead. I think it could be a cool thing. One basically only has to update whenever a new nmm:TVShow is created or when the new day dawns.
And the cherry on top is of course Bangarang:
Something Dry: Change Notifications
Ignoring the fact that I did not blog in nearly two months I will simply get some developer information out there. Getting notified about changes in the Nepomuk database has always been a problem. All we had for a long time where the ugly statementAdded and statementRemoved signals from Soprano which, when actually used, would slow down the whole system as one would have to check each single statement for the information one needed.
Thus, with the introduction of the Data Management Service a while back we also gave birth to the ResourceWatcher which can be used to watch resources, properties, and types for changes. The concept is simple. Just create an instance of the watcher and tell it which resources or which types of resources you want to watch for changes. In addition you can restrict it to specific properties. Then you get nice signals which inform you about the changes when they happen.
Nepomuk::ResourceWatcher *watcher = new Nepomuk::ResourceWatcher(this);
watcher->addType(NCO::Contact());
connect(watcher, SIGNAL(resourceCreated(Nepomuk::Resource, QList<QUrl>)),
this, SLOT(slotCreated(Nepomuk::Resource, QList<QUrl>)));
watcher->start();
The problem with this has been that it only works with data manipulation which happens through the Data Management Service and libnepomuk did not use that for a long time. Now we finally fixed that (sadly I did not manage to push it in time for 4.8 but it will be in 4.8.1) and the change notifications become really useful. I also implemented a bunch of unit tests and made sure the most important types of notifications actually work.
So all in all an important step for developers using Nepomuk which was overdue.
Chat logs in Nepomuk
Prototyping is fun. You don't need to care about proper libraries. Your code can be absolutely horrible, cause "Hey! It's just a prototype!"
Yesterday, I started the process of importing my entire gTalk chat history into Nepomuk. It turned out to be a lot simpler that I thought it would be.
Step 1: Get the chat logsGMail fortunately allows you to export your chat logs via SMTP. They don't implement the traditional XMPP-0136 for fetching offline messages. But at least, unlike Facebook, they provide a mechanism.
I landed up using getmail for importing all chat logs.
getmailrc
[retriever]
type = SimpleIMAPSSLRetriever
server = imap.gmail.com
mailboxes = ("[Gmail]/Chats",)
username = *****@gmail.com
password = ********
[destination]
type = Maildir
path = ~/Chats/
I originally wanted to use offlineimap but they seem to have a problem fetching the Chats in GMail.
Step 2: Write a parser
The chat logs are presented in a custom xml format encapsulated in the email. The content was in the traditional quoted-printable format, as most emails are. Writing a parser didn't take too long. Plus, with the new Nepomuk Datamanagement APIs, pushing them into Nepomuk was even simpler.
Ideally, this should be implemented as a strigi analyzer, so that it becomes a part of Nepomuk's Indexing framwork. But hey! It's a prototype!
What's the point of having your chat logs in NepomukWell, for one, the Telepathians can use this to show chat logs. We'll obviously need a better way of importing the chat logs. Manually calling nepomuk-chat-feeder obviously isn't an option. So we'll need to find a proper way of fetching chat logs.
The second, more personal, use is that I finally have a usable dataset to determine important people in my life - based on the chat frequency and timings. AFAIK Facebook internally uses a combination of likes, comments, chat history and stalking to determine how important a person is to you, and accordingly place them higher in the auto-completion list and chat sidebar.
This obviously has many other applications like altering the chat list based on the people you converse with when you're doing one activity.
Source Code: kde:scratch/vhanda/nepomuk-gtalk-chatlogs
Symbolic Links in Nepomuk – A Solution
Until now symbolic links were not handled in Nepomuk. Today I commited the last patch for the new symlink support in Nepomuk. The solution I chose is not the theoretically perfect one. That would have taken way to much effort while introducing all kinds of possible bugs, regressions, API incompatibilities, and so on. But the solution is nice and clean and simple.
Essentially each direct symlink is indexed as a separate file using the content of its target file. (This is necessary since a direct symlink might have a different file name than the target file.) The interesting part are the indirect symlinks. Indirect symlinks are files in a folder which is a symlink to another folder. An example:
/home/trueg/ |-- subdir/ |-- thefile.txt |-- link/ -> subdir/ |-- thefile.txt
Here I have a folder “subdir” which contains a file “thefile.txt”. The folder “link” is a direct symlink to “subdir” whereas “link/thefile.txt” is an indirect symlink to “subdir/thefile.txt”.
Indirect symlinks are simply stored as alternative URLs on the target file resources using the kext:altUrl property. (The property is not defined in NIE since it is not theoretically sound with respect to the design of NIE. It needs to be considered a beautiful hack.)
The only situation in which the alternative URLs are actually needed is when searching in a specific folder. Imagine searching in “/home/trueg/link” only. Since there are no nie:url values which match that prefix we need to search the kext:altUrls, too.
The result of all this is that nearly no additional space is required except for the kext:altUrl properties, files are not indexed more than once, and files in symlinked folders are found in addition to “normal” files.
In my tests everything seems to work nicely but I urge you to test the nepomuk/symlinkHandling branches in kdelibs and kde-runtime and report any problems back to me. The more testing I get the quicker I can merge both into KDE 4.8.
Lastly the pledgie campaign is done but the search for funds goes on:
Finding Duplicate Images Made Easy
It is a typical problem: we downloaded images from a camera, maybe did not delete them from the camera instantly, then downloaded the same images again next time, maybe created an album by copying images into sub-folders (without Nepomuk Digikam can only do so much ;), and so on. Essentially there are a lot of duplicate photos lying around.
But never fear. Just let Nepomuk index all of them and then gather all the duplicates via:
select distinct ?u1 ?u2 where {
?f1 a nexif:Photo .
?f2 a nexif:Photo .
?f1 nfo:hasHash ?h .
?f2 nfo:hasHash ?h .
?f1 nie:url ?u1 .
?f2 nie:url ?u2 .
filter(?f1!=?f2) .
}
Quick explanation: the query does select all nexif:Photo resources which have the same hash value but are not the same. This of course can be tweaked by adding something like
?f1 nfo:fileName ?fn . ?f2 nfo:fileName ?fn .
to make sure that we only catch the ones that we downloaded more than once. Or we add
?f1 nie:contentCreated ?cc . ?f2 nie:contentCreated ?cc .
to ensure that the photo was actually taken at the same time – although I suppose the probability that two different photos have the same hash value is rather small.
Maybe one last little detail. In theory it would be more correct to do the following:
?f1 nfo:hasHash ?h1 . ?f2 nfo:hasHash ?h2 . ?h1 nfo:hashValue ?h . ?h2 nfo:hashValue ?h .
However, with the introduction of the Data Management Service in KDE 4.7 similar hash resources are merged into one. Thus, the slightly simpler query above. Still, to be sure to also properly handle pre-KDE-4.7 data the above addition might be prudent.
Of course this should be hidden in some application which does the work for you. The point is that Nepomuk has a lot of power that only reveals itself at second glance. :)
Manually Forcing the (Re-)Indexing of Folders is Easy
Ever since the unicode bug in Virtuoso 6.1.3 many of us have broken unicode strings in our Nepomuk databases. Completely re-creating the database is IMHO not an option since that would mean loosing all manual annotations and things like download source URLs. One solution would be restoring a backup but I simply do not trust the Nepomuk backup until I had a deeper look into it. The perfect solution would be if Nepomuk could simply fix the data automatically. While that is of course my goal and I am looking into that it will take a while.
In the meantime I threw together a small desktop file which adds two new actions to the context menu of folders.
- (Re-)index Folder contents will make the indexer update all the files in the folder indifferent of their state in Nepomuk. This includes fixed unicode strings.
- (Re-)index Folder contents recursive does the same as the above except that it also recurses into sub folders.
Simply put the following into a file called “nepomuk-index-folder.desktop” and save it in “~/.kde/share/kde4/services/ServiceMenus”. At the next start of Dolphin or Konqueror the two new actions will be available.
[Desktop Entry] Type=Service X-KDE-ServiceTypes=KonqPopupMenu/Plugin,inode/directory Actions=indexFolder;indexFolderRecursive; X-KDE-Submenu=Desktop Search Icon=nepomuk [Desktop Action indexFolder] Name=(Re-)index Folder contents Icon=nepomuk Exec=qdbus org.kde.nepomuk.services.nepomukfileindexer /nepomukfileindexer org.kde.nepomuk.FileIndexer.indexFolder %f 0 1 [Desktop Action indexFolderRecursive] Name=(Re-)index Folder contents recursive Icon=nepomuk Exec=qdbus org.kde.nepomuk.services.nepomukfileindexer /nepomukfileindexer org.kde.nepomuk.FileIndexer.indexFolder %f 1 1
Update: The code above does only work for KDE 4.8 since we renamed the “strigi service” to “file indexing service”. So in order to make this work in KDE 4.7 and before replace “nepomukfileindexer” with “nepomukstrigiservice” and “FileIndexer” with “Strigi”.
Soprano 2.7.4 released
Soprano 2.7.4 is another bugfix release in the 2.7 series:
- Enabled large file support (_FILE_OFFSET_BITS=64) to fix large DB file locking on 32bit machines.
- Do not use an event loop when waiting for Virtuoso to initialize.
- In the socket client: simply close the connection in case of a timeout. We cannot recover from it anyway.
Nepomuk Fundraiser – Badamm (Or Some Other Really Clever and Funny Title I Cannot Think of at the Moment)
It happened. Alf Rustad donated the missing 356€ which broken the magical barrier of 9000€ in the Nepomuk Fundraiser I started nearly three months ago.
While the actual goal – securing long-term funding for Nepomuk – has not been reached yet this is a great opportunity to thank Alexander, Alvar, Andreas, Andre, Andrew, Angelo, Anton, Antonio-J, Ardy123, arkub, Baltasar, Bernd, Bernhard, Calogero, Carl, Ceferino, Christopher, Christoph, Claude, Cristiano, Daniel, David, dunkelschorsch, Eduard, Efthymia, Elias, the two Enriques, Fabio, Felix, Florian, Francisco, Friedhelm, Fux, Gael, Giacomo, Giorgio, Guillaume, Günter, Hans, Han, Hartmut, Hector, Hendy, Huftis, Jaroslav, Jérôme, Jesus, Josep, Jos, Jramskov, Juan, Juanjo, Junichi, Kai, Kenneth, Kevin, Kilian, Kulomi, Leopoldo, Linopolus, Luca, Luis, Luiz, Maik, Manoel, Manuel, Marco, Marc, the three Markusses and Martins, Maxime, Mguel, the two Michaels, Mikael, Mike, Morgan, Nicolas, Olaf, Olivier, Orestes, the two Pauls, Paulo, the two Peters, Philipp, Pierre-Hugues, Régis, Robert and Robert, Rodrigo, samtuke, the Sebastians, Simone, Sören, Stefano, Steffen, Stian, tanghus, Thiago, Thomas, Thomas, and Thomas, Tiago, Timothy, Tommi, Tuukka, Ulrich, Wakeley, Xavier, Yaroslav, and all the anonymous doners for their support. You have given me time to keep looking.
A special thanks goes to Carl Symons for his great dot article, his many tips and continuous encouragement.
Thank you also to Peter, George, Ivan, Vishesh, Christian, Andrew, Martin, and Laura for their great developer comments on Nepomuk.
And last but not least thanks for all the positive feedback on my blog articles, the translations into strange and exotic languages such as spanish :P and all the encouraging words which showed how many actually get what the semantic desktop is all about and want Nepomuk to go on and change the way we work with information today.
The Different Places Something Can Go Wrong
This is just a little blog entry about the impact that the ontologies can have on functionality.
The ontologies are a set of vocabularies describing the types of resources stored in Nepomuk, the possible relations between these types, and the possible annotations. We have for example a type for local files, one for an address book entry, one for a person, one for music content and so on. We also have relations that describe that some person is the author or some piece of content and so on.
These ontologies are maintained in the Shared-Desktop-Ontologies project – to my knowledge the only real open-source project developing RDF ontologies.
Now to the actual topic. There once was a bug. Like so many other bugs it talked about file indexing in Nepomuk and like so many other bugs it said that some file could not be indexed. First it was Nepomuk’s fault, then it was the fault of libstreamanalyzer, but in the end I realized: there was a bug in the ontologies. More specificly in NMM – the Nepomuk MultiMedia ontology. (Granted this was not really the source of the hang the bug talks about but it was the reason the file could not be indexed.)
The problem was the domain of the nmm:setSize property. Each property has a domain and a range – the domain defines on which type of resource the property can be set, the range defines the type of the value. In other words they are defining the subject and object type of the triple. The domain is always a resource type (rdfs:Class), the range a resource or a literal type (typically one defined in the XML schema). In this case the domain of nmm:setSIze was set as nmm:MusicPiece whereas it should have been nmm:MusicAlbum. Thus, Nepomuk rejected the data generated by libstreamanalyzer as being invalid due to using an invalid domain. (Update: Nepomuk treats RDF data in a closed-world fashion. In comparison to the open-world approach which is typical for RDF/S resource types are not inferred from their relations. In an open-world situation the resource would simply end up being both a nmm:MusicPiece and a nmm:MusicAlbum.)
The solution is shared-desktop-ontologies 0.8.1 with the fixed domain. Installing it will make Nepomuk re-parse the changed ontology and indexing the mp3 files in question will finally work.
Well, this was pretty verbose for a rather small issue. Still it gave a little introduction into how the ontologies are used in Nepomuk. One more thing to take care of in the “Nepomuk universe”.
And as always:







