About
This site provides supplemental material and information about the paper Learning Temporal Semantic Relations in Tweets.
Abstract. In this paper, we investigate whether semantic relationships between entities can be learnt from analyzing microblog posts published on Twitter. We identify semantic links between persons, products, events and other entities. We develop a relation discovery framework that allows for the detection of typed relations that moreover may have temporal dynamics. Based on a large Twitter dataset, we evaluate different strategies and show that co-occurrence based strategies allow for high precision and perform particularly well for relations between persons and events achieving precisions of more than 80%. We further analyze the performance in learning relationships that are valid only for a certain time period and reveal that for those types of relationships Twitter is a suitable source as it allows for discovering trending topics with higher accuracy and with lower delay in time than traditional news media.
Slides presented at ICWE:
View more presentations from Web Information Systems, TU Delft.
1. Datasets
Tweets: Over a period of more than two months (starting from end of October to beginning of January) we crawled Twitter information streams of more than 20,000 users. Together, these people published more than 10 million tweets.
News: To allow for linkage of tweets with news articles we also monitored more than 60 RSS feeds of prominent news media such as BBC, CNN or New York Times and aggregated the content of 77,544 news articles.
Semantics: Given the content of Twitter messages and news articles we extract entities to better understand the semantics of Twitter activities. Therefore we utilize OpenCalais.
name | number of records | description |
tweets.sql.gz (643MB) | 2316204 | sample of tweets processed with OpenCalais |
news.sql.gz (73MB) | 77544 | news articles monitored from 62 news media websites |
sementicsTweetsEntity.sql.gz (71MB) | 1896328 | entity assignments extracted from tweets (1,051,524); 709,245 distinct entities (categorized in 39 types) |
sementicsNewsEntity.sql.gz (40MB) | 1216570 | entity assignments extracted from news (63,140), 170,577 distinct entities (39 different types of entities) |
2. Relations
In this paper, we analyzed relations between different types of entities. In particular, we analyzed the following 71 types of relations.
- Person and City
- Person and Country
- Person and Organization
- Person and Company
- Organization and City
- Organization and Country
- Company and City
- Company and Country
- Company and Product
- Currency and Country
- Currency and Continent
- Holiday and Country
- Technology and Person
- Technology and Company
- NaturalFeature and City
- NaturalFeature and Country
- NaturalFeature and Continent
- NaturalFeature and Region
- MedicalCondition and Person
- MedicalCondition and MedicalTreatment
- MedicalCondition and City
- MedicalCondition and Country
- MedicalCondition and Continent
- MedicalCondition and Region
- Region and City
- Region and Country
- Region and Continent
- Organization and PhoneNumber
- Organization and EmailAddress
- Company and PhoneNumber
- Company and FaxNumber
- Company and EmailAddress
- Organization and FaxNumber
- Person and EmailAddress
- Person and FaxNumber
- Person and PhoneNumber
- City and EntertainmentAwardEvent
- Country and EntertainmentAwardEvent
- Continent and EntertainmentAwardEvent
- Person and EntertainmentAwardEvent
- MusicAlbum and EntertainmentAwardEvent
- MusicalGroup and EntertainmentAwardEvent
- Movie and EntertainmentAwardEvent
- ProgrammingLanguage and OperatingSystem
- Product and OperatingSystem
- MusicAlbum and MusicGroup
- Person and TVShow
- TVShow and TVStation
- RadioStation and RadioProgram
- Person and PoliticalEvent
- City and PoliticalEvent
- Country and PoliticalEvent
- Region and PoliticalEvent
- City and SportsEvent
- Country and SportsEvent
- Region and SportsEvent
- Person and SportsEvent
- SportsGame and SportsEvent
- SportsGame and SportsLeague
- SportsLeague and TVStation
- SportsEvent and TVStation
- SportsEvent and RadioStation
- SportsLeague and RadioStation
- SportsLeague and Country
- Person and Movie
- Person and MusicGroup
- Person and MusicAlbum
- URL and Company
- URL and Organization
- Person and Position
- Person and Person
3. Comment on Findings
In our paper, we saw that relationships between events (e.g. SportsEvent or PoliticalEvent) and other entities can be discovered with high precision while for relationships among persons (Person) the precision is rather low. We confirmed these findings on the ground truth obtained from DBpedia. The figure below shows additional results that complement the results shown in Figure 6(b). It is interesting to see that the partial order between products and person relationships is different. When diving into the concrete relationships among products (e.g. Product and OperatingSystem) and persons (e.g. Person and Person), we see that many products (e.g. specific mobile phones) have no DBpedia URI. Therefore, a relationship between a product such as the N900 and the operating system Android would be classified as not related even though the N900 features Android 2.3.
Hence, the performances reported above (as well as in the paper) would further increase if these issues would be considered.