French Treebank
A lexcical and syntactic resource richly annotated (and validated manually) for linguists, NLP-ready.
- Project began in 1997, with the support of IUF, CNRS and CNRTL
- 21.550 sentences (appx. 664.500 tokens) from the newspaper Le Monde (1990-1993)
- Metadata: author, date, domain (for each article)
- Lexical annotations (categories, sub-categories, inflection, compounds with components) and syntactic annotations (main constituants, grammatical functions) validated
- Downloadable in multiple formats (version 1.0 2016 : xml, Tiger-xml, PTB, CoNLL)
The French Treebank is distributed for research purposes. To do so, you first have to agree the terms and conditions.
You can also contract a licence for commercial purposes. If so, please contact us.
Quotation: Abeillé, A., L. Clément, and F. Toussenel. 2003. "Building a treebank for French", in A. Abeillé (ed) Treebanks, Kluwer, Dordrecht. (p.165-187)
The corpus has been annotated by softwares developed specifically for this set of tasks (Clément 2001) then systematically corrected by hand.
If you notice a remaining error, first assure you that it is not a conscious choice by consulting the guidelines; if not, please notify it.
Examples of a syntactic annotation
Pick up a sentence
Pick up the format to visualize
Affichage du format
Une quinzaine de militaires libériens ont été transférés à Abidjan.
<SENT argument="ETR" author="MINANGOY ROBERT" date="1990-01-19" nb="1015" textID="456">
<NP fct="SUJ">
<w cat="D" ee="D-ind-fs" ei="Dfs" lemma="un" mph="fs" subcat="ind">Une</w>
<w cat="N" ee="N-C-fs" ei="NCfs" lemma="quinzaine" mph="fs" subcat="C">quinzaine</w>
<PP>
<w cat="P" ee="P" ei="P" lemma="de">de</w>
<NP>
<w cat="N" ee="N-C-mp" ei="NCmp" lemma="militaire" mph="mp" subcat="C">militaires</w>
<AP>
<w cat="A" ee="A-qual-mp" ei="Amp" lemma="libérien" mph="mp" subcat="qual">libériens</w>
</AP>
</NP>
</PP>
</NP>
<VN>
<w cat="V" ee="V--P3p" ei="VP3p" lemma="avoir" mph="P3p" subcat="">ont</w>
<w cat="V" ee="V--Kms" ei="VKms" lemma="être" mph="Kms" subcat="">été</w>
<w cat="V" ee="V--Kmp" ei="VKmp" lemma="transférer" mph="Kmp" subcat="">transférés</w>
</VN>
<PP fct="P-OBJ">
<w cat="P" ee="P" ei="P" lemma="à">à</w>
<NP>
<w cat="N" ee="N-P-ms" ei="NPms" lemma="Abidjan" mph="ms" subcat="P">Abidjan</w>
</NP>
</PP>
<w cat="PONCT" ee="PONCT-S" ei="PONCTS" lemma="." subcat="S">.</w>
</SENT>
(SENT (NP-SUJ (D Une) (N quinzaine) (PP (P de) (NP (N militaires) (AP (A libériens))))) (VN (V ont) (V été) (V transférés)) (PP-P_OBJ (P à) (NP (N Abidjan))) (PONCT .))
<s id="-1015">
<graph root="-1015_1008">
<terminals>
<t id="-1015_1" word="Une" pos="D" lemma="un" num="s" subcat="ind" gen="f"/>
<t id="-1015_2" word="quinzaine" pos="N" lemma="quinzaine" num="s" subcat="c" gen="f"/>
<t id="-1015_3" word="de" pos="P" lemma="de"/>
<t id="-1015_4" word="militaires" pos="N" lemma="militaire" num="p" subcat="c" gen="m"/>
<t id="-1015_5" word="libériens" pos="A" lemma="libérien" num="p" subcat="qual" gen="m"/>
<t id="-1015_6" word="ont" pos="V" pers="3" lemma="avoir" num="p" tense="pst" mood="ind"/>
<t id="-1015_7" word="été" pos="V" lemma="être" num="s" tense="past" gen="m" mood="part"/>
<t id="-1015_8" word="transférés" pos="V" lemma="transférer" num="p" tense="past" gen="m" mood="part"/>
<t id="-1015_9" word="à" pos="P" lemma="à"/>
<t id="-1015_10" word="Abidjan" pos="N" lemma="Abidjan" num="s" subcat="p" gen="m"/>
<t id="-1015_11" word="." pos="PONCT" lemma="." subcat="s"/>
</terminals>
<nonterminals>
<nt id="-1015_1001" cat="AP" >
<edge label="" idref="-1015_5"/>
</nt>
<nt id="-1015_1002" cat="NP" >
<edge label="" idref="-1015_4"/>
<edge label="" idref="-1015_1001"/>
</nt>
<nt id="-1015_1003" cat="PP" >
<edge label="" idref="-1015_3"/>
<edge label="" idref="-1015_1002"/>
</nt>
<nt id="-1015_1004" cat="NP" fct="SUJ">
<edge label="" idref="-1015_1"/>
<edge label="" idref="-1015_2"/>
<edge label="" idref="-1015_1003"/>
</nt>
<nt id="-1015_1005" cat="VN" >
<edge label="" idref="-1015_6"/>
<edge label="" idref="-1015_7"/>
<edge label="" idref="-1015_8"/>
</nt>
<nt id="-1015_1006" cat="NP" >
<edge label="" idref="-1015_10"/>
</nt>
<nt id="-1015_1007" cat="PP" fct="P_OBJ">
<edge label="" idref="-1015_9"/>
<edge label="" idref="-1015_1006"/>
</nt>
<nt id="-1015_1008" cat="SENT" >
<edge label="SUJ" idref="-1015_1004"/>
<edge label="" idref="-1015_1005"/>
<edge label="P_OBJ" idref="-1015_1007"/>
<edge label="" idref="-1015_11"/>
</nt>
</nonterminals>
</graph>
</s>
1 Une un D DET sentid=flmf3_01000_01499ep-1015|g=f|n=s|s=ind 2 det 2 det
2 quinzaine quinzaine N NC g=f|n=s|s=c 8 suj 8 suj
3 de de P P _ 2 dep 2 dep
4 militaires militaire N NC g=m|n=p|s=c 3 obj.p 3 obj.p
5 libériens libérien A ADJ g=m|n=p|s=qual 4 mod 4 mod
6 ont avoir V V m=ind|n=p|p=3|t=pst 8 aux.tps 8 aux.tps
7 été être V VPP g=m|m=part|n=s|t=past 8 aux.pass 8 aux.pass
8 transférés transférer V VPP g=m|m=part|n=p|t=past 0 root 0 root
9 à à P P _ 8 p_obj 8 p_obj
10 Abidjan Abidjan N NPP g=m|n=s|s=p 9 obj.p 9 obj.p
11 . . PONCT PONCT s=s 8 ponct 8 ponct
Aussi s’est-elle évertuée à torpiller tous les projets en faveur de Rhône-Rhin.
<SENT argument="ECO" author="FAUJAS ALAIN" date="1990-01-19" nb="1067" textID="464">
<w cat="ADV" ee="ADV" ei="ADV" lemma="aussi">Aussi</w>
<VN fct="SUJ">
<w cat="CL" ee="CL-refl-3fs" ei="CL3fs" lemma="il" mph="3fs" subcat="refl">s'</w>
<w cat="V" ee="V--P3s" ei="VP3s" lemma="être" mph="P3s" subcat="">est</w>
<w cat="CL" ee="CL-suj-3fs" ei="CL3fs" lemma="il" mph="3fs" subcat="suj">-elle</w>
<w cat="V" ee="V--Kfs" ei="VKfs" lemma="évertuer" mph="Kfs" subcat="">évertuée</w>
</VN>
<VPinf fct="A-OBJ">
<w cat="P" ee="P" ei="P" lemma="à">à</w>
<VN>
<w cat="V" ee="V--W" ei="VW" lemma="torpiller" mph="W" subcat="">torpiller</w>
</VN>
<NP fct="OBJ">
<w cat="A" ee="A-ind-mp" ei="Amp" lemma="tout" mph="mp" subcat="ind">tous</w>
<w cat="D" ee="D-def-mp" ei="Dmp" lemma="le" mph="mp" subcat="def">les</w>
<w cat="N" ee="N-C-mp" ei="NCmp" lemma="projet" mph="mp" subcat="C">projets</w>
<PP>
<w cat="P" compound="yes" ee="P" ei="P" lemma="en faveur de">
<w catint="P">en</w>
<w catint="N">faveur</w>
<w catint="P">de</w>
</w>
<NP>
<w cat="N" ee="N-P-ms" ei="NPms" lemma="Rhône" mph="ms" subcat="P">Rhône</w>
<w cat="PONCT" ee="PONCT-W" ei="PONCTW" lemma="-" subcat="W">-</w>
<w cat="N" ee="N-P-ms" ei="NPms" lemma="Rhin" mph="ms" subcat="P">Rhin</w>
</NP>
</PP>
</NP>
</VPinf>
<w cat="PONCT" ee="PONCT-S" ei="PONCTS" lemma="." subcat="S">.</w>
</SENT>
(SENT (ADV Aussi) (VN-SUJ (CL s') (V est) (CL -elle) (V évertuée)) (VPinf-A_OBJ (P à) (VN (V torpiller)) (NP-OBJ (A tous) (D les) (N projets) (PP (P (P en) (N faveur) (P de)) (NP (N Rhône) (PONCT -) (N Rhin))))) (PONCT .))
<s id="-1067">
<graph root="-1067_1008">
<terminals>
<t id="-1067_1" word="Aussi" pos="ADV" lemma="aussi"/>
<t id="-1067_2" word="s'" pos="CL" pers="3" lemma="il" num="s" subcat="refl" gen="f"/>
<t id="-1067_3" word="est" pos="V" pers="3" lemma="être" num="s" tense="pst" mood="ind"/>
<t id="-1067_4" word="-elle" pos="CL" pers="3" lemma="il" num="s" subcat="suj" gen="f"/>
<t id="-1067_5" word="évertuée" pos="V" lemma="évertuer" num="s" tense="past" gen="f" mood="part"/>
<t id="-1067_6" word="à" pos="P" lemma="à"/>
<t id="-1067_7" word="torpiller" pos="V" lemma="torpiller" mood="inf"/>
<t id="-1067_8" word="tous" pos="A" lemma="tout" num="p" subcat="ind" gen="m"/>
<t id="-1067_9" word="les" pos="D" lemma="le" num="p" subcat="def" gen="m"/>
<t id="-1067_10" word="projets" pos="N" lemma="projet" num="p" subcat="c" gen="m"/>
<t id="-1067_11" word="en" pos="P" catint="P"/>
<t id="-1067_12" word="faveur" pos="N" catint="N"/>
<t id="-1067_13" word="de" pos="P" catint="P"/>
<t id="-1067_14" word="Rhône" pos="N" lemma="Rhône" num="s" subcat="p" gen="m"/>
<t id="-1067_15" word="-" pos="PONCT" lemma="-" subcat="w"/>
<t id="-1067_16" word="Rhin" pos="N" lemma="Rhin" num="s" subcat="p" gen="m"/>
<t id="-1067_17" word="." pos="PONCT" lemma="." subcat="s"/>
</terminals>
<nonterminals>
<nt id="-1067_1001" cat="VN" fct="SUJ">
<edge label="" idref="-1067_2"/>
<edge label="" idref="-1067_3"/>
<edge label="" idref="-1067_4"/>
<edge label="" idref="-1067_5"/>
</nt>
<nt id="-1067_1002" cat="VN" >
<edge label="" idref="-1067_7"/>
</nt>
<nt id="-1067_1003" cat="P" lemma="en_faveur_de" compound="yes">
<edge label="" idref="-1067_11"/>
<edge label="" idref="-1067_12"/>
<edge label="" idref="-1067_13"/>
</nt>
<nt id="-1067_1004" cat="NP" >
<edge label="" idref="-1067_14"/>
<edge label="" idref="-1067_15"/>
<edge label="" idref="-1067_16"/>
</nt>
<nt id="-1067_1005" cat="PP" >
<edge label="" idref="-1067_1003"/>
<edge label="" idref="-1067_1004"/>
</nt>
<nt id="-1067_1006" cat="NP" fct="OBJ">
<edge label="" idref="-1067_8"/>
<edge label="" idref="-1067_9"/>
<edge label="" idref="-1067_10"/>
<edge label="" idref="-1067_1005"/>
</nt>
<nt id="-1067_1007" cat="VPinf" fct="A_OBJ">
<edge label="" idref="-1067_6"/>
<edge label="" idref="-1067_1002"/>
<edge label="OBJ" idref="-1067_1006"/>
</nt>
<nt id="-1067_1008" cat="SENT" >
<edge label="" idref="-1067_1"/>
<edge label="SUJ" idref="-1067_1001"/>
<edge label="A_OBJ" idref="-1067_1007"/>
<edge label="" idref="-1067_17"/>
</nt>
</nonterminals>
</graph>
</s>
1 Aussi aussi ADV ADV sentid=flmf3_01000_01499ep-1067 5 mod 5 mod
2 s' le/lui CL CLR g=f|n=s|p=3|s=refl 5 aff 5 aff
3 est être V V m=ind|n=s|p=3|t=pst 5 aux.tps 5 aux.tps
4 -elle il CL CLS g=f|n=s|p=3|s=suj 5 suj 5 suj
5 évertuée évertuer V VPP g=f|m=part|n=s|t=past 0 root 0 root
6 à à P P _ 5 a_obj 5 a_obj
7 torpiller torpiller V VINF m=inf 6 obj.p 6 obj.p
8 tous tout A ADJ g=m|n=p|s=ind 10 mod 10 mod
9 les le D DET g=m|n=p|s=def 10 det 10 det
10 projets projet N NC g=m|n=p|s=c 7 obj 7 obj
11 en en P P mwehead=P+|pred=y 10 dep 10 dep
12 faveur faveur N NC g=f|n=s|s=c|pred=y 11 dep_cpd 11 dep_cpd
13 de de P P pred=y 11 dep_cpd 11 dep_cpd
14 Rhône Rhône N NPP g=m|n=s|s=p 11 obj.p 11 obj.p
15 - - PONCT PONCT s=w 14 ponct 14 ponct
16 Rhin Rhin N NPP g=m|n=s|s=p 14 mod 14 mod
17 . . PONCT PONCT s=s 5 ponct 5 ponct
La diminution paraît, toutefois, moins nette en France et en Italie.
<SENT argument="ECO" author="LEMONDE" date="1990-01-19" nb="1093" textID="467">
<NP fct="SUJ">
<w cat="D" ee="D-def-fs" ei="Dfs" lemma="le" mph="fs" subcat="def">La</w>
<w cat="N" ee="N-C-fs" ei="NCfs" lemma="diminution" mph="fs" subcat="C">diminution</w>
</NP>
<VN>
<w cat="V" ee="V--P3s" ei="VP3s" lemma="paraître" mph="P3s" subcat="">paraît</w>
</VN>
<w cat="PONCT" ee="PONCT-W" ei="PONCTW" lemma="," subcat="W">,</w>
<w cat="ADV" ee="ADV" ei="ADV" lemma="toutefois">toutefois</w>
<w cat="PONCT" ee="PONCT-W" ei="PONCTW" lemma="," subcat="W">,</w>
<AP fct="ATS">
<w cat="ADV" ee="ADV" ei="ADV" lemma="moins">moins</w>
<w cat="A" ee="A-qual-fs" ei="Afs" lemma="net" mph="fs" subcat="qual">nette</w>
</AP>
<PP fct="MOD">
<w cat="P" ee="P" ei="P" lemma="en">en</w>
<NP>
<w cat="N" ee="N-P-fs" ei="NPfs" lemma="France" mph="fs" subcat="P">France</w>
</NP>
<COORD>
<w cat="C" ee="C-C" ei="CC" lemma="et" subcat="C">et</w>
<PP>
<w cat="P" ee="P" ei="P" lemma="en">en</w>
<NP>
<w cat="N" ee="N-P-fs" ei="NPfs" lemma="Italie" mph="fs" subcat="P">Italie</w>
</NP>
</PP>
</COORD>
</PP>
<w cat="PONCT" ee="PONCT-S" ei="PONCTS" lemma="." subcat="S">.</w>
</SENT>
(SENT (NP-SUJ (D La) (N diminution)) (VN (V paraît)) (PONCT ,) (ADV toutefois) (PONCT ,) (AP-ATS (ADV moins) (A nette)) (PP-MOD (P en) (NP (N France)) (COORD (C et) (PP (P en) (NP (N Italie))))) (PONCT .))
<s id="-1093">
<graph root="-1093_1009">
<terminals>
<t id="-1093_1" word="La" pos="D" lemma="le" num="s" subcat="def" gen="f"/>
<t id="-1093_2" word="diminution" pos="N" lemma="diminution" num="s" subcat="c" gen="f"/>
<t id="-1093_3" word="paraît" pos="V" pers="3" lemma="paraître" num="s" tense="pst" mood="ind"/>
<t id="-1093_4" word="," pos="PONCT" lemma="," subcat="w"/>
<t id="-1093_5" word="toutefois" pos="ADV" lemma="toutefois"/>
<t id="-1093_6" word="," pos="PONCT" lemma="," subcat="w"/>
<t id="-1093_7" word="moins" pos="ADV" lemma="moins"/>
<t id="-1093_8" word="nette" pos="A" lemma="net" num="s" subcat="qual" gen="f"/>
<t id="-1093_9" word="en" pos="P" lemma="en"/>
<t id="-1093_10" word="France" pos="N" lemma="France" num="s" subcat="p" gen="f"/>
<t id="-1093_11" word="et" pos="C" lemma="et" subcat="c"/>
<t id="-1093_12" word="en" pos="P" lemma="en"/>
<t id="-1093_13" word="Italie" pos="N" lemma="Italie" num="s" subcat="p" gen="f"/>
<t id="-1093_14" word="." pos="PONCT" lemma="." subcat="s"/>
</terminals>
<nonterminals>
<nt id="-1093_1001" cat="NP" fct="SUJ">
<edge label="" idref="-1093_1"/>
<edge label="" idref="-1093_2"/>
</nt>
<nt id="-1093_1002" cat="VN" >
<edge label="" idref="-1093_3"/>
</nt>
<nt id="-1093_1003" cat="AP" fct="ATS">
<edge label="" idref="-1093_7"/>
<edge label="" idref="-1093_8"/>
</nt>
<nt id="-1093_1004" cat="NP" >
<edge label="" idref="-1093_10"/>
</nt>
<nt id="-1093_1005" cat="NP" >
<edge label="" idref="-1093_13"/>
</nt>
<nt id="-1093_1006" cat="PP" >
<edge label="" idref="-1093_12"/>
<edge label="" idref="-1093_1005"/>
</nt>
<nt id="-1093_1007" cat="COORD" >
<edge label="" idref="-1093_11"/>
<edge label="" idref="-1093_1006"/>
</nt>
<nt id="-1093_1008" cat="PP" fct="MOD">
<edge label="" idref="-1093_9"/>
<edge label="" idref="-1093_1004"/>
<edge label="" idref="-1093_1007"/>
</nt>
<nt id="-1093_1009" cat="SENT" >
<edge label="SUJ" idref="-1093_1001"/>
<edge label="" idref="-1093_1002"/>
<edge label="" idref="-1093_4"/>
<edge label="" idref="-1093_5"/>
<edge label="" idref="-1093_6"/>
<edge label="ATS" idref="-1093_1003"/>
<edge label="MOD" idref="-1093_1008"/>
<edge label="" idref="-1093_14"/>
</nt>
</nonterminals>
</graph>
</s>
1 La le D DET sentid=flmf3_01000_01499ep-1093|g=f|n=s|s=def 2 det 2 det
2 diminution diminution N NC g=f|n=s|s=c 3 suj 3 suj
3 paraît paraître V V m=ind|n=s|p=3|t=pst 0 root 0 root
4 , , PONCT PONCT s=w 3 ponct 3 ponct
5 toutefois toutefois ADV ADV _ 3 mod 3 mod
6 , , PONCT PONCT s=w 3 ponct 3 ponct
7 moins moins ADV ADV _ 8 mod 8 mod
8 nette net A ADJ g=f|n=s|s=qual 3 ats 3 ats
9 en en P P _ 3 mod 3 mod
10 France France N NPP g=f|n=s|s=p 9 obj.p 9 obj.p
11 et et C CC s=c 9 coord 9 coord
12 en en P P _ 11 dep.coord 11 dep.coord
13 Italie Italie N NPP g=f|n=s|s=p 12 obj.p 12 obj.p
14 . . PONCT PONCT s=s 3 ponct 3 ponct