Publications, tools and derived resources
This is a short list of all the works based or derived from the French Treebank. If you happen to find a work not listed here, please send us a message.
The French Treebank is constituency based. A dependency based version has been automaticaly derived from it (Candito et al. 2009 ; 2010). A categorial grammar version has been derived by Moot 2015.
Derived resources
- Universal Dependency version of FTB : Seddah et al 2018
- Annotations FrameNet (105 frames) on a subpart of the FTB : Djemaa et al. 2016
- Categorial grammar, Moot 2015
- French (gold) corpus for the SPRML 2013 shared task: 38 files from FTB, converted to constituant format with dependency annotations
- Aix-en-Provence Corpus (2012): 1.471 sentences from the FTB (appx. 26.000 words). Annotations with property grammars (FTB-LPL) (Blache et Rauzy 2012)
- LPL Physiological corpus (2012): 198 sentences from the FTB (6.572 tokens) annotated with reading times and eyes movements annotated (Rauzy et Blache 2012)
- Nomage:semantics-driven nominalization lexicon (Balvet et al 2009):
- Dependency corpus from Alpage (FTB-DEP) 25 files: 12.500 sentences (2008 version) (Candito et al. 2009)
- Lexicon of French adjectives (Treelex): 2.200 adjectives with valence from the FTB (Kupsc 2008)
- Lexicon of French verbs (Treelex): 2.000 verbs with valence from the FTB (Kupsc et Abeillé 2008)
- Dublin Corpus (2007): 4.741 sentences from FTB (134.445 words): Modified FTB (MFTB)
- French Discourse Treebank: annotated corpus for discourse analysis
- French part of the Dundee Corpus (Kennedy, Hill & Pynte, 2003) : 52.173 tokens with reading times and eyes movements annotated
The annotation schema of the FTB has been used to annotate more resources:
- Sequoia treebank : 3099 phrases (médical, Europarl, frwiki, Est républicain) (Candito et Seddah, 2012)
- Question bank : 2000 phrases (corpus de questions) (Seddah et Candito 2016)
- The French social media bank : corpus de tweet / blog (Seddah et al. 2012)
- Treebank oral : 2118 phrases (transcriptions France Inter et C-oral-rom) (Abeillé et Crabbé 2013)
- 4-couv : 3500 phrases (4e de couvertures : 500 textes) (Blache et al 2015)
Derived tools
- Flemm: Lemmatization from Atilf (Namer 2000)
- Morfette: POS tagging and lemmatization (2008). Chrupała et al. (2008), Seddah et al. (2010)
- SEM: POS tagging of French. (Constant et al., 2011)
- Syntactic analysis and POS tagging from Alpage (Melt) : Denis and Sagot (2009)
- Syntactic analysis with dependency (Bonsai):
- Syntactic analysis from LIF (Macaon): A Nasr et al
- POS Tagging from LPL (Marsa Tag) : Rauzy et al 2014
Publications
2019
- Abeillé, Anne, Clément, L., Liégeois, Loïc. – "Un corpus annoté pour le français : le French Treebank", TAL cite>, 60 : 2.19-43.
2018
- Seddah, Djamé, De La Clergerie, Eric, Sagot, Benoît, et al. – "Cheating a Parser to Death: Data-driven Cross-Treebank Annotation Transfer", LREC
2016
- Abeillé, Anne, Hemforth, Barbara, Winckel, Elodie. – "Les relatives en dont : études empiriques". In Actes 5e CMLF, Tours.
- Djemaa, Marianne, Candito, Marie, Muller P. et al. – "Corpus annotation within the French Framenet: methodology and results". In Actes 10e International Conference on Language Resources and Evaluation (LREC). – Portorož (Slovenia)
- Seddah, Djamé, Candito, Marie. – "Hard Time Parsing Questions: Building a QuestionBank for French". In Actes 10e International Conference on Language Resources and Evaluation (LREC). – Portorož (Slovenia)
2015
- Blache, Philippe, Montcheuil, Grégoire, Rauzy, Stéphane, et al. Création d’un nouveau treebank à partir de quatrièmes de couverture. Actes TALN. p. 480-486, 2015
- Danlos, Laurence, Colinet, Margot, Steinlin, Jacques. – FDTB1, première étape du projet « French Discourse Treebank » : repérage des connecteurs de discours en corpus
- Moot, Richard. – "A type-logical treebank for French". – Journal of Language Modelling, Vol. 3, No. 1 (2015), p. 229–264
- Steinlin, Jacques, Colinet, Margot, Danlos, Laurence. – "FDTB1 : Repérage des connecteurs de discours en corpus". In Traitement automatique du langage naturel, juin 2015, Caen (France)
2014
- Crabbé, Benoît. – "Un analyseur discriminant de la famille LR pour l'analyse en constituants". – Actes TALN, Marseille
- Dupont, Yoann, Tellier, Isabelle. – "Un reconnaisseur d’entités nommées du Français". In Actes TALN, 2014
- Hale, John T. – "Surprisal and Chunking". In Automaton Theories of Human Sentence Comprehension. – Stanford : Center for the Study of Language and Information, 2014. – p. 91-99
- Ribeyre, Corentin, Candito, Marie, Seddah, Djame. – "Semi-Automatic Deep Syntactic Annotations of the French Treebank". The 13th International Workshop on Treebanks and Linguistic Theories (TLT13), Dec 2014, Tubingen
2013
- Abeillé, A., Crabbé, B. – "Vers un treebank du français parlé", Actes TALN, Les Sables d’Olonnes.
- Seddah, Djamé, Reut Tsarfaty, Sandra Kübler, et. al. – Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages. In Proceedings of the Fourth SPMRL Workshop. – Seattle, USA, 2013.
2012
- Blache, Philippe . – "Estimating Constraint Weights from Treebanks". In Proceedings of CSLP, 2012
- Blache, Philippe, Rauzy, Stéphane. – "Hybridization and Treebank Enrichment with Constraint-Based Representations". In Proceedings of LREC, 2012
- Blache, Philippe, Rauzy, Stéphane. – "Enrichissement du FTB : un treebank hybride constituants/propriétés". In Actes de TALN, 2012
- Boudin, Florian, Hernandez, Nicolas. – "Détection et correction automatique d'erreurs d'annotation morpho-syntaxique du French TreeBank". In Actes TALN, 2012
- Candito, Marie, Seddah, Djamé. – "Effectively long-distance dependencies in French: annotation and parsing evaluation", The 11th International Workshop on Treebanks and Linguistic Theories (TLT11)
- Candito, Marie, Seddah, Djamé. – "Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical", Actes TALN
- Guillaume, Bruno, Perrier, Guy. – "Annotation sémantique du French Treebank à l’aide de la réécriture modulaire de graphes". In Actes TALN, 2012
- Matthieu Constant, Anthony Sigogne, Patrick Watrin. – La reconnaissance des mots composés à l’épreuve de l’analyse syntaxique et vice-versa : évaluation de deux stratégies discriminantes. In Conférence sur le Traitement Automatique des Langues Naturelles, juin 2012, p. 57–70.
- Munshi Asadullah, Patrick Paroubek, Anne Vilnat 2012, Bidirectionnal converter between syntactic annotations: from French Treebank Dependencies to PASSAGE annotations, and back, LREC
- Rauzy, Stéphane, Blache, Philippe . – "Robustness and processing difficulty models. A pilot study for eye-tracking data on the French Treebank". In Proceedings of Eye-tracking and NLP workshop (COLING), 2012
- Sagot, Benoît, Richard, Marion, Stern, Rosa. – "Annotation référentielle du Corpus Arboré de Paris 7 en entités nommées". In Actes TALN, vol. 2. – Grenoble, juin 2012, p. 535-542
- Seddah, Djamé, Sagot, Benoît, Candito, Marie et al. – "The French Social Media Bank: a Treebank of Noisy User Generated Content", Actes COLING, Mumbi
2011
- Green, Spence, de Marneffe†, Marie-Catherine, Bauer, John et al. – "Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French". In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. – p. 725–735
2010
- Candito, M.-H., Crabbé B., Denis, P. – "Statistical French dependency parsing: treebank conversion and first results". – Proceedings of LREC'2010, La Valletta (Malta), 2010
- Moot, Richard. "Wide-Coverage French Syntax and Semantics using Grail". – TALN, Montréal, 19–23 juillet 2010
2009
- Candito, M.-H., Crabbé, Benoît, Denis, P., Guérin, F. – "Analyse syntaxique du français : des constituants aux dépendances", Proceedings of TALN 2009, Senlis (France), 2009
- Denis, Pascal, Sagot, Benoît. "Coupling an annotated corpus and a morphosyntactic lexicon for state-of-the-art POS tagging with less human effort". PACLIC 2009, 2009
- Fabre, C., Kupść, A. – "Large and noisy vs small and reliable: combining 2 types of corpora for adjective valence extraction". In Mahlberg, M., González-Díaz, V., Smith, C. (ed.), Corpus Linguistics Conference, Liverpool, 20-23 juillet 2009
- Moreau E., Tellier I., Balvet A., Laurence G., Rozenknop A., Poibeau T. – Annotation fonctionnelle de corpus arboré avec des Champs Aléatoires Conditionnels, Actes de laconférence TALN 2009, 24-26 juin 2009, Senlis.
- Pynte, J., New, B. & Kennedy, A. – "On-line syntactic and semantic influences in reading revisited", Journal of Eye Movement Research, 3(1):5, 1-12.
2008
- Chrupała, Grzegorz, Dinu, Georgiana and van Genabith, Josef. – Learning Morphology with Morfette. LREC 2008, 2008.
- Kupść, A., Abeillé A. – "Adjectives in TreeLex". In Kłopotek, M., Przepiórkowski, A., Wierzchoń, S., Trojanowski, K. (ed.), 16th International Conference Intelligent Information Systems, Zakopane (Poland), 16-18 juin 2008, Academic Publishing House EXIT, p. 287-296
- Kupść, A., Abeillé A. – "Growing TreeLex". In Gelbukh, A. (ed.), 9th International Conference (CICLing), Haifa (Israel), February 2008, p. 28-39 (Lecture Notes in Computational Linguistics, 4919)
- Kupść, A., Abeillé A. – "TreeLex: A Subcategorisation Lexicon for French Verbs". In Proceedings of the First International Conference on Global Interoperability for Language Resources, Hong-Kong, 9-11 janvier 2008
- Pynte, Joel, New, Boris, Kennedy, Alan. – "A multiple regression analysis of syntactic and semantic influences in reading normal text". – Journal of Eye Movement Research, 2(1):4, 1-11
2007
- Schluter, Natalie, van Genabith, Josef. – "Preparing, Restructuring and Augmenting a French Treebank: Lexicalised Parsing or Coherent Treebanks". In Proceedings of the 10th Conference of the Pacific Association of Computational Linguistics (PACLING), Melbourne (Australia), 2007
2006
- Alexis Nasr. – Grammaires de dépendances génératives probabilistes. Modèle théorique et application à un corpus arboré du français. In Traitement Automatique des Langues, vol. 46, n° 1, avril 2006.
2005
- Arun, Abhishek, Keller, Frank. – "Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French". In Proceedings of the 43rd Annual Meeting of the ACL. – p. 306–313
2004
- Abeillé A. & Barrier N. (2004). Enriching a French treebank. In Proceedings of the LREC04 Conference, Lisbonne.
2003
- Abeillé, A., L. Clément, and F. Toussenel. 2003. "Building a treebank for French", in A. Abeillé (ed) Treebanks, Kluwer, Dordrecht. (p.165-187)
2001
- Clément Lionel. – Construction et exploitation d'un corpus syntaxiquement annoté pour le français, 2001 (thèse de doctorat université Paris 7)
2000
- Abeillé, Anne, Clément, L., Kinyon, A. – "Building a treebank for French". In Actes LREC. – Athènes (Grèce)
- Namer, F. – "Flemm : Un analyseur Flexionnel de Français à base de règles". T.A.L. 41 : 523-547, 2000.
1998
- Abeillé, Anne, Clément. – "A reference tagged corpus for French". In Actes LREC. – Grenade